YU YIYI YU

Last Updated :2025/11/04

所属・職名
大学院先進理工系科学研究科 准教授
ホームページ

基本情報

学位

  • 修士(理学) (奈良女子大学)
  • 博士(情報科学) (奈良女子大学)

教育活動

授業担当

  1. 2025年, 教養教育, 4ターム, 知能とコンピュータ[旧パッケージ]
  2. 2025年, 学部専門, 集中, AI基礎
  3. 2025年, 教養教育, 1ターム, 教養ゼミ
  4. 2025年, 学部専門, 3ターム, 情報科学演習III(知能科学プログラム)
  5. 2025年, 学部専門, 3ターム, 音声認識
  6. 2025年, 学部専門, 1ターム, 知能科学セミナーI
  7. 2025年, 学部専門, 2ターム, 知能科学セミナーII
  8. 2025年, 学部専門, セメスター(後期), 卒業論文
  9. 2025年, 修士課程・博士課程前期, 3ターム, 情報科学特別演習A
  10. 2025年, 修士課程・博士課程前期, 4ターム, 情報科学特別演習A
  11. 2025年, 修士課程・博士課程前期, 1ターム, 情報科学特別演習A
  12. 2025年, 修士課程・博士課程前期, 2ターム, 情報科学特別演習A
  13. 2025年, 修士課程・博士課程前期, 3ターム, 情報科学特別演習B
  14. 2025年, 修士課程・博士課程前期, 4ターム, 情報科学特別演習B
  15. 2025年, 修士課程・博士課程前期, 年度, 情報科学特別研究
  16. 2025年, 博士課程・博士課程後期, 年度, 情報科学特別研究

研究活動

学術論文(★は代表的な論文)

  1. ★, DUDA: A Two-stage Decoupling Unsupervised Domain Adaptation Framework for Semi-supervised Singing Melody Extraction from Polyphonic Music, ACM Multimedia (MM), 2025., 202510
  2. Scene-guided attention network for spatial understanding in 3D scenes, accepted by IEEE International Conference on Multimedia Retrieval, 2025, 202507
  3. A Harmonic-Aware Fine-Tuning Approach for Beat Tracking, accepted by IEEE International Conference on Multimedia and Expo (ICME), 2025, 202507
  4. BeatFM: Improving Beat Tracking with Pre-trained Music Foundation Model, accepted by IEEE International Conference on Multimedia and Expo (ICME), 2025, 202507
  5. Chain-of-Thought Prompting with Causal Intervention for Multimodal Aspect-based Sentiment Analysis, International Conference on Database Systems for Advanced Applications (DASFAA), 2025, 202506
  6. A Mamba-based network for semi-supervised singing melody extraction using confidence binary regularization, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025., 202504
  7. Visual entity-centric prompting for knowledge retrieval in knowledge-based VQA, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025., 202504
  8. Enhancing video-text matching via sparse stratified sampling, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025., pp. 1-5, 202504
  9. KCE-UNET: A novel music denoising method with kanconv ECA unet, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025., 202504
  10. ★, Semantic frame aggregation-based Transformer for live video comment generation, accepted by IEEE Transactions on Multimedia, 2025., 202512
  11. Dialogue-to-video retrieval via multi-grained attention network, IEEE access 2025, 13巻, 13号, pp. 82302-82311, 202505
  12. ★, A survey of recent advances and challenges in deep audio-visual correlation learning, ACM Computing Surveys, 202505
  13. Adversarial Contrastive Autoencoder With Shared Attention for Audio-Visual Correlation Learning, IEEE ACCESS, 13巻, pp. 39753-39764, 2025
  14. Enhancing semantic audio-visual representation learning with supervised multi-scale attention, PATTERN ANALYSIS AND APPLICATIONS, 28巻, 2号, 202506
  15. Semantic enrichment for video question answering with gated graph neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 202404
  16. A scalable sparse transformer model for singing melody extraction, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 202404
  17. Syllable-level lyrics generation from melody exploiting character-level language model, European Chapter of the Association for Computational Linguistics (EACL), 202403
  18. Scalable motion style transfer with constrained diffusion generation, The Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI), 202402
  19. Semantic dependency network for lyrics generation from melody, Neural Computing and Applications, 20231209
  20. Detecting dialogue hallucination using graph neural networks, Association for Machine Learning and Applications (AMLA), 202312
  21. Emotionally enhanced talking face generation, 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, ACM MM, 2023, 202310
  22. Graph-based video-language learning with multi-grained audio-visual alignment, ACM Multimedia (MM), pp. 3975-3984, 202310
  23. Stripe-Transformer: deep stripe feature learning for music source separation, EURASIP Journal on Audio, Speech, and Music Processing, 2023., 202310
  24. Controllable lyrics-to-melody generation., Neural Computing and Applications, 35巻, 27号, pp. 19805-19819, 202309
  25. Multi-scale network with shared cross-attention for audio–visual correlation learning, Neural Computing and Applications, 35巻, 27号, pp. 20173-20187, 20230719
  26. MFAE: Masked frame-level features autoencoder with hybrid-supervision for low-resource music Transcription, IEEE International Conference on Multimedia and Expo (ICME), pp. 1109-1114, 202307
  27. LC-Beating: An online system for beat and downbeat tracking using latency-controlled mechanism, pp. 1098-1103, 202307
  28. Frame-level multi-label playing technique detection using multi-scale network and self-attention mechanism, 202304
  29. Variational Autoencoder with CCA for Audio–Visual Cross-modal Retrieval, ACM Transactions on Multimedia Computing, Communications, and Applications, 19巻, 3s号, pp. 1-21, 20230224
  30. Melody-conditioned lyrics generation via fine-tuning language model and its evaluation with ChatGPT., CoRR, abs/2310.00863巻, 2023
  31. Controllable Lyrics-to-Melody Generation., CoRR, abs/2306.02613巻, 2023
  32. Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics, 2022 IEEE International Symposium on Multimedia (ISM), pp. 236-239, 202212
  33. Melody Generation from Lyrics with Local Interpretability, ACM Transactions on Multimedia Computing, Communications, and Applications, 19巻, 3号, pp. 1-21, 20221129
  34. Conditional Hybrid GAN for Melody Generation from Lyrics, Journal of Neural Computing and Applications, https://rdcu.be/cXazS , doi: 10.1007/s00521-022-07863-5, 2022., 202210
  35. Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training, ACM Multimedia (MM), 2022, 202210
  36. HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription, International Society for Music Information Retrieval Conference (ISMIR), 2022, 202210
  37. Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance, accepted by International Society for Music Information Retrieval Conference (ISMIR), 2022, 202209
  38. A Neural Harmonic-Aware Network with Gated Attentive Fusion for Singing Melody Extraction, Journal of Neurocomputing, 202208
  39. Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation, ACM Transactions on Multimedia Computing, Communications, and Applications, 202208
  40. Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network, IEEE International Conference on Multimedia and Expo (ICME), 2022, 202207
  41. HarmoF0: Logarithmic Scale Dilated Convolution for Pitch Estimation, IEEE International Conference on Multimedia and Expo (ICME), 2022, 202207
  42. Lightweight bimodal network for single-image super-resolution via symmetric CNN and recursive transformer, 202207
  43. Context-patch representation learning with adaptive neighbor embedding for robust face image super-resolution, IEEE Transactions on Multimedia, 202207
  44. DEEPCHORUS: A Hybrid Model of Multi-scale Convolution and Self-attention for Chorus Detection, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, 202204
  45. Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution, 202202
  46. FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation, IEEE Transactions on Multimedia, 202201
  47. Leaning compact and representative features for cross‑modality person re‑identification, Journal of World Wide Web, 202201
  48. Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN, International Conference on Multimedia Modeling, 2021, 202112
  49. MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation, IEEE Transactions on Intelligent Transportation Systems, 202112
  50. Towards Multi-domain Face Synthesis via Domain-Invariant Representations and Multi-level Feature Parts, IEEE Transactions on Multimedia, 202112
  51. Adversarial learning with mask reconstruction for text-guided image inpainting, ACM MM, 2021, pp. 3464-3472, 202110
  52. HANME: Hierarchical attention network for singing melody extraction, 28巻, pp. 1006-1010, 202109
  53. Interpretable visual understanding with cognitive attention network, International Conference on Artificial Neural Networks (ICANN) 2021, 202108
  54. Conditional LSTM-GAN for Melody Generation from Lyrics, ACM Transaction on Multimedia Computing Communication and Applications, 202102
  55. A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation, IEEE Transactions on Intelligent Transportation Systems, 28巻, pp. 1006-1010, 2021
  56. MusicTM-Dataset for Joint Representation Learning Among Sheet Music, Lyrics, and Musical Audio, Lecture Notes in Electrical Engineering, 761 LNEE巻, pp. 78-89, 2021
  57. Frequency-Temporal Attention Network for Singing Melody Extraction, accepted by ICASSP 2021, 202101
  58. Singer identification using deep timbre feature learning with KNN-net, accepted by ICASSP 2021, 202101
  59. C3VQG: Category Consistent Cyclic Visual Question Generation, ACM MM Asia, 2021
  60. Correlation Discrepancy Insight Network for Video Re-identification, ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), 202101
  61. Robust Facial Image Super-Resolution by Kernel Locality-Constrained Coupled-Layer Regression, ACM Transactions on Internet Technology (TOIT), 202011
  62. Constructing Multilayer Locality-Constrained Matrix Regression Framework for Noise Robust Face Super-Resolution, Journal of Pattern Recognition, 202010
  63. SIST: Online Scale-Adaptive Object tracking with Stepwise Insight, Neurocomputing, 384巻, pp. 200-212, 20200407
  64. Lyrics-Conditioned Neural Melody Generation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11962巻, pp. 709-714, 2020
  65. Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics, Proceedings - 2020 IEEE 6th International Conference on Multimedia Big Data, BigMM 2020, pp. 162-165, 2020
  66. PAI-BPR: Personalized outfit recommendation scheme with attribute-wise interpretability, pp. 221-230, 2020
  67. A Relation learning hierarchical framework for multi-label charge prediction, the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 729-741, 2020
  68. End-to-end Named Entity Recognition from English Speech, Interspeech, pp. 4268-4272, 2020
  69. Image Super-Resolution via Multi-view Information Fusion Networks, 402巻, pp. 29-37, 2020
  70. Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression, Journal of Information Sciences, 506巻, pp. 19-36, 2020
  71. Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval, ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), 3巻, 76号, pp. 1-23, 2020
  72. LBAN-IL: A Novel Method of High Discriminative Representation for Facial Expression Recognition, Journal of Neurocomputing, 432巻, pp. 159-169, 2020
  73. Multi-scale Patch based Representation Feature Learning for Low-Resolution Face Recognition, Journal of applied Soft Computing, 2020
  74. Research on Singing Voice Detection Based on a Long-term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing, Electronics in MDPI open access journal, 2020
  75. Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2020
  76. Personalized Music Recommendation with Triplet Network, DEIM Forum 2019, pp. No.F8-5, 5p., 201903
  77. Graph-Regularized Locality-Constrained Joint Dictionary and Residual Learning for Face Sketch Synthesis, IEEE Transactions on Image Processing, 201902
  78. Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data., IEEE Trans. Neural Netw. Learning Syst., 30巻, 4号, pp. 1250-1258, 2019
  79. Incremental Re-identification by Cross-Direction and Cross-Ranking Adaption, accepted by IEEE Transactions on Multimedia 2019, 2019
  80. Face hallucination through differential evolution parameter map learning with facial structure prior, accepted by Journal of Information Sciences 2019, 2019
  81. Audio-Visual embedding for cross-modal music video retrieval through Supervised Deep CCA, accepted by IEEE ISM2018, 201812
  82. Deep Learning of Human Perception in Audio Event Classification, accepted by IEEE ISM2018, 201812
  83. Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing, accepted by the 2018 IEEE International Conference on Data Mining (ICDM'18)., 201810
  84. Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval, ACM Transaction on Multimedia Computing Communication and Applications (ACMTOMM), 201810
  85. Ensemble Super-Resolution with A Reference Dataset, accepted by IEEE Transactions on Cybernetics 2018, 201810
  86. Context-Patch Face Hallucination based on Thresholding Locality-constrained Representation and Reproducing Learning, IEEE Transactions on Cybernetics, 201807
  87. Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination, International Joint Conference on Artificial Intelligence (IJCAI) 2018, 201807
  88. Residual Learning for Face Sketch Synthesis, ICASSP2018, pp. 1952-1956, 201804
  89. Video-based Person Re-identification Self Paced Weighting, accepted by AAAI 2018, 201802
  90. Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning., CoRR, abs/1809.00665巻, 2018
  91. Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data., CoRR, abs/1805.02997巻, 2018
  92. VenueNet: Fine-Grained Venue Discovery by Deep Correlation Learning, Proceedings - 2017 IEEE International Symposium on Multimedia, ISM 2017, 2017-巻, pp. 288-291, 20171228
  93. Compact LBP and WLBP Descriptor with Magnitude and Direction Difference for Face Recognition, accepted by IEEE International Conference on Image Processing (ICIP) 2017., 201709
  94. Person Re-identification via Discrepancy Matrix and Matrix Metric, accepted by IEEE Transactions on Cybernetics 2017, 201709
  95. Statistical Inference of Gaussian-Laplace Distribution for Person Verification, accepted by ACM Multimedia (MM) 2017., 201708
  96. “Deep Multi-Label Hashing for Large-Scale Visual Search Based on Semantic Graph,” accepted by APWeb-WAIM Joint Conference  on Web and Big Data 2017., 201707
  97. Context-Patch based Face Hallucination via Thresholding Locality-Constrained Representation with Reproducing Learning, top 3% papers accepted by IEEE International Conference on Multimedia and Expo(ICME) 2017., 201707
  98. JSFox: Integrating static and dynamic type analysis of javascript programs, Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering Companion, ICSE-C 2017, pp. 256-258, 20170630
  99. A Query Refinement Framework for Xml Keyword Search,, Journal of World Wide Web (2017). doi:10.1007/s11280-017-0447-z, https://link.springer.com/article/10.1007/s11280-017-0447-z., 201703
  100. Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 14巻, 3号, pp. 404-408, 201703
  101. TAICHI Distance for Person Re-identification, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017, 201703
  102. 3A: A Person Re-identification System via Attribute Augmentation and Aggregation, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017, 201703
  103. TAICHI Distance for Person Re-identification, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017., 201702
  104. Using psychoacoustic models for sound analysis in music, ACM International Conference Proceeding Series, 08-10-巻, pp. 1-7, 20161208
  105. Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing, IEEE TRANSACTIONS ON MULTIMEDIA, 18巻, 12号, pp. 2553-2566, 201612
  106. Concept-level multimodal ranking of Flickr photo tags via recall based weighting, MMCommons 2016 - Proceedings of the 2016 ACM Workshop on the Multimedia COMMONS, co-located with ACM Multimedia 2016, pp. 19-26, 20161016
  107. Leveraging multimodal information for event summarization and concept-level sentiment analysis, KNOWLEDGE-BASED SYSTEMS, 108巻, pp. 102-109, 201609
  108. Fuzzy clustering of lecture videos based on topic modeling, Proceedings - International Workshop on Content-Based Multimedia Indexing, 2016-巻, 20160627
  109. Fuzzy Clustering of Lecture Videos Based on Topic Modeling, accepted by 14th Workshop on Content-Based Multimedia Indexing (CBMI), 2016., 201604
  110. Scale-adaptive Low-resolution Person Re-identification via Learning a Discriminating Surface, accepted by the 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016., 201604
  111. Zero-Shot Person Re-identification via Cross-View Consistency, IEEE TRANSACTIONS ON MULTIMEDIA, 18巻, 2号, pp. 260-272, 201602
  112. Predicting User Preference Based on Matrix Factorization by Exploiting Music Attributes, Ninth International C* Conference on Computer Science & Software Engineering (C3S2E), 2016, 2016
  113. Videopedia: Lecture video recommendation for educational blogs using topic modeling, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516巻, pp. 238-250, 2016
  114. Camera network based person re-identification by leveraging spatial-temporal constraint and multiple cameras relations, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516巻, pp. 174-186, 2016
  115. NEWSMAN: Uploading videos over adaptive middleboxes to news servers in weak network infrastructures, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516巻, pp. 100-113, 2016
  116. On Generating Content-Oriented Geo Features for Sensor-Rich Outdoor Video Search, IEEE TRANSACTIONS ON MULTIMEDIA, 17巻, 10号, pp. 1760-1772, 201510
  117. Efficient geo-fencing via hybrid hashing: A combination of bucket selection and in-bucket binary search, accepted by ACM Transactions on Spatial Algorithms and Systems, 1巻, 5号, 201503
  118. TRACE: Linguistic-based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts, 2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), pp. 217-220, 2015
  119. EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media, MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, pp. 185-188, 2015
  120. Adaptive Margin Nearest Neighbor for Person Re-Identification, ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT I, 9314巻, pp. 75-84, 2015
  121. Multi-Level Fusion for Person Re-identification with Incomplete Marks, MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, pp. 1267-1270, 2015
  122. Social interactions over location-aware multimedia systems, Multimedia Data Mining and Analytics: Disruptive Innovation, pp. 117-146, 20150101
  123. Empirical observation of user activities: Check-ins, venue photos and tips in foursquare, WISMM 2014 - Proceedings of the 1st International Workshop on Internet-Scale Multimedia Management, Workshop of MM 2014, pp. 31-34, 20141107
  124. Emerging topics on personalized and localized multimedia information systems, MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 1233-1234, 20141103
  125. “ATLAS: Automatic temporal segmentation and annotation of lecture videos based on modelling transition time, in Proc. ACM international conference on Multimedia, pp. 209-212, 20141103
  126. A Probabilistic Associative Model for Segmenting Weakly Supervised Images, IEEE TRANSACTIONS ON IMAGE PROCESSING, 23巻, 9号, pp. 4150-4159, 201409
  127. Student performance evaluation of multimodal learning via a vector space model, in Proc. WISMM in ACM MM, pp. 27-30, 2014
  128. User preference-aware video generation based on modeling scene moods, in Proc. ACM MMSys’14, pp. 156-159, 2014
  129. ADVISOR - Personalized video soundtrack recommendation by late fusion with heuristic rankings, in Proc. ACM international conference on Multimedia (ACM MM’14),, pp. 607-616, 2014
  130. Scalable Content-Based Music Retrieval Using Chord Progression Histogram and Tree-Structure LSH, IEEE TRANSACTIONS ON MULTIMEDIA, 15巻, 8号, pp. 1969-1981, 201312
  131. Edge-based locality sensitive hashing for efficient geo-fencing application, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, pp. 566-569, 2013
  132. Social Interactions over geographic-aware multimedia systems, ACM international conference on Multimedia, pp. 1115-1116, 2013
  133. Edge-based locality sensitive hashing for efficient geo-fencing application, in Proc. ACM SIGSPATIAL GIS, pp. 586-589, 2013
  134. Query-document-dependent fusion: a case study of multimodal music retrieval, IEEE Transaction on Multimedia, 15巻, 8号, pp. 1830-1842, 2013
  135. Automatic music soundtrack generation for outdoor videos from contextual sensor information, in Proc. ACM international conference on Multimedia, pp. 1377-1378, 2012
  136. Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval, 2012 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), pp. 9-16, 2012
  137. Recommender system for MIR research community, in Proc. JCDL, pp. 409-410, 2010
  138. Combing multi-probe histogram and order-statistics based LSH for scalable audio content retrieval, in Proc. ACM international conference on Multimedia, pp. 381-390, 2010
  139. Local summarization and multi-level LSH for retrieving multi-variant audio tracks, in Proc. ACM international conference on Multimedia, pp. 341-350, 2009
  140. COSIN: Content-based retrieval system for cover songs, MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops, pp. 987-988, 2008
  141. Using Exact Locality Sensitive Mapping to Group and Detect Audio-Based Cover Songs, ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, pp. 302-+, 2008
  142. Similarity searching techniques in content-based audio retrieval via hashing, ADVANCES IN MULTIMEDIA MODELING, PT 1, 4351巻, Part I号, pp. 397-407, 2007
  143. Scalable motion style transfer with constrained diffusion generation, AAAI, vol.38,巻, No.9,号, pp. 10234-10242, 202402
  144. Anchor-aware deep metric learning for audio-visual retrieval, ACM ICMR 2024, pp. 211-219, 202406
  145. Syllable-level lyrics generation from melody exploiting character-level language model, EACL 2024, pp. 1336-1346, 202403
  146. HKDSME: Heterogeneous knowledge distillation for semi-supervised singing melody extraction using harmonic supervision, ACM Multimedia (MM) [CORE A*], pp. 545-553, 202410
  147. “Generalized news event discovery via dynamic augmentation and entropy optimization,” accepted by ACM Multimedia (MM), 2024. [CORE A*], ACM Multimedia (MM) [CORE A*], pp. 10018-10026, 202410
  148. A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning, IEEE Transactions on Multimedia,, pp. 7933-7945, 202403
  149. Semantic dependency network for lyrics generation from melody, Journal of Neural Computing and Applications, Vol.36, Issue 8巻, pp. 4059-4069, 202403
  150. Multi-scale network with shared cross attention for audio-visual correlation learning, Journal of Neural Computing and Applications, Vol. 35巻, pp. 20173-20187, 2023
  151. Controllable lyrics-to-melody generation, Journal of Neural Computing and Applications, Volume 35巻, pp. 19805-19819, 202309
  152. Stripe-Transformer: deep stripe feature learning for music source separation, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2023, Issue 1巻, 20230112
  153. An efficient feature reuse distillation network for lightweight image super-resolution, COMPUTER VISION AND IMAGE UNDERSTANDING, 249巻, 202412
  154. Efficient Dual-Branch Information Interaction Network for Lightweight Image Super-Resolution, IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 73巻, 2024
  155. Controllable syllable-level lyrics generation from melody with prior attention, IEEE Transactions on Multimedia, Vol.26巻, pp. 11083-11094, 202408