Yi Yu
Last Updated :2024/12/13
- Affiliations, Positions
- Hiroshima University, Associate
- Web Site
- E-mail
- yiyuhiroshima-u.ac.jp
Basic Information
Academic Degrees
- Nara Women’s University
- Nara Women’s University
Educational Activity
Course in Charge
- 2024, Liberal Arts Education Program1, 4Term, Intelligence and Computer
- 2024, Undergraduate Education, Intensive, Basics of AI
- 2024, Undergraduate Education, 1Term, Data Science Seminar I
- 2024, Undergraduate Education, 2Term, Data Science Seminar II
- 2024, Undergraduate Education, Second Semester, Graduation Thesis
- 2024, Graduate Education (Master's Program) , 1Term, Special Exercises on Informatics and Data Science A
- 2024, Graduate Education (Master's Program) , 2Term, Special Exercises on Informatics and Data Science A
- 2024, Graduate Education (Master's Program) , 3Term, Special Exercises on Informatics and Data Science B
- 2024, Graduate Education (Master's Program) , 4Term, Special Exercises on Informatics and Data Science B
- 2024, Graduate Education (Master's Program) , Academic Year, Special Study on Informatics and Data Science
- 2024, Graduate Education (Doctoral Program) , Academic Year, Special Study on Informatics and Data Science
Research Activities
Academic Papers
- Semantic enrichment for video question answering with gated graph neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 202404
- A scalable sparse transformer model for singing melody extraction, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 202404
- Syllable-level lyrics generation from melody exploiting character-level language model, European Chapter of the Association for Computational Linguistics (EACL), 202403
- Scalable motion style transfer with constrained diffusion generation, The Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI), 202402
- Semantic dependency network for lyrics generation from melody, Neural Computing and Applications, 20231209
- Detecting dialogue hallucination using graph neural networks, Association for Machine Learning and Applications (AMLA), 202312
- Emotionally enhanced talking face generation, 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice, ACM MM, 2023, 202310
- Graph-based video-language learning with multi-grained audio-visual alignment, ACM Multimedia (MM), 3975-3984, 202310
- Stripe-Transformer: deep stripe feature learning for music source separation, EURASIP Journal on Audio, Speech, and Music Processing, 2023, 202310
- Controllable lyrics-to-melody generation., Neural Computing and Applications, 35(27), 19805-19819, 202309
- Multi-scale network with shared cross-attention for audio–visual correlation learning, Neural Computing and Applications, 35(27), 20173-20187, 20230719
- MFAE: Masked frame-level features autoencoder with hybrid-supervision for low-resource music Transcription, IEEE International Conference on Multimedia and Expo (ICME), 1109-1114, 202307
- LC-Beating: An online system for beat and downbeat tracking using latency-controlled mechanism, 1098-1103, 202307
- Frame-level multi-label playing technique detection using multi-scale network and self-attention mechanism, 202304
- Variational Autoencoder with CCA for Audio–Visual Cross-modal Retrieval, ACM Transactions on Multimedia Computing, Communications, and Applications, 19(3s), 1-21, 20230224
- Melody-conditioned lyrics generation via fine-tuning language model and its evaluation with ChatGPT., CoRR, abs/2310.00863, 2023
- Controllable Lyrics-to-Melody Generation., CoRR, abs/2306.02613, 2023
- Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics, 2022 IEEE International Symposium on Multimedia (ISM), 236-239, 202212
- Melody Generation from Lyrics with Local Interpretability, ACM Transactions on Multimedia Computing, Communications, and Applications, 19(3), 1-21, 20221129
- Conditional Hybrid GAN for Melody Generation from Lyrics, Journal of Neural Computing and Applications, https://rdcu.be/cXazS , doi: 10.1007/s00521-022-07863-5, 2022., 202210
- Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training, ACM Multimedia (MM), 2022, 202210
- HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription, International Society for Music Information Retrieval Conference (ISMIR), 2022, 202210
- Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance, accepted by International Society for Music Information Retrieval Conference (ISMIR), 2022, 202209
- A Neural Harmonic-Aware Network with Gated Attentive Fusion for Singing Melody Extraction, Journal of Neurocomputing, 202208
- Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation, ACM Transactions on Multimedia Computing, Communications, and Applications, 202208
- Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network, IEEE International Conference on Multimedia and Expo (ICME), 2022, 202207
- HarmoF0: Logarithmic Scale Dilated Convolution for Pitch Estimation, IEEE International Conference on Multimedia and Expo (ICME), 2022, 202207
- Lightweight bimodal network for single-image super-resolution via symmetric CNN and recursive transformer, 202207
- Context-patch representation learning with adaptive neighbor embedding for robust face image super-resolution, IEEE Transactions on Multimedia, 202207
- DEEPCHORUS: A Hybrid Model of Multi-scale Convolution and Self-attention for Chorus Detection, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, 202204
- Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution, 202202
- FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation, IEEE Transactions on Multimedia, 202201
- Leaning compact and representative features for cross‑modality person re‑identification, Journal of World Wide Web, 202201
- Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN, 202112
- Towards Multi-domain Face Synthesis via Domain-Invariant Representations and Multi-level Feature Parts, 202112
- HANME: Hierarchical attention network for singing melody extraction, 28, 1006-1010, 202109
- Interpretable visual understanding with cognitive attention network, 202108
- Conditional LSTM-GAN for Melody Generation from Lyrics, ACM Transaction on Multimedia Computing Communication and Applications, 202102
- A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation, IEEE Transactions on Intelligent Transportation Systems, 28, 1006-1010, 2021
- MusicTM-Dataset for Joint Representation Learning Among Sheet Music, Lyrics, and Musical Audio, Lecture Notes in Electrical Engineering, 761 LNEE, 78-89, 2021
- Frequency-Temporal Attention Network for Singing Melody Extraction, accepted by ICASSP 2021, 202101
- Singer identification using deep timbre feature learning with KNN-net, accepted by ICASSP 2021, 202101
- C3VQG: Category Consistent Cyclic Visual Question Generation, ACM MM Asia, 2021
- Correlation Discrepancy Insight Network for Video Re-identification, ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), 202101
- Robust Facial Image Super-Resolution by Kernel Locality-Constrained Coupled-Layer Regression, ACM Transactions on Internet Technology (TOIT), 202011
- Constructing Multilayer Locality-Constrained Matrix Regression Framework for Noise Robust Face Super-Resolution, Journal of Pattern Recognition, 202010
- SIST: Online Scale-Adaptive Object tracking with Stepwise Insight, Neurocomputing, 384, 200-212, 20200407
- Lyrics-Conditioned Neural Melody Generation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11962, 709-714, 2020
- Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics, Proceedings - 2020 IEEE 6th International Conference on Multimedia Big Data, BigMM 2020, 162-165, 2020
- PAI-BPR: Personalized outfit recommendation scheme with attribute-wise interpretability, 221-230, 2020
- A Relation learning hierarchical framework for multi-label charge prediction, 729-741, 2020
- End-to-end Named Entity Recognition from English Speech, Interspeech, 4268-4272, 2020
- Image Super-Resolution via Multi-view Information Fusion Networks, 402, 29-37, 2020
- Cross-resolution face recognition with pose variations via multilayer locality-constrained structural orthogonal procrustes regression, Journal of Information Sciences, 506, 19-36, 2020
- Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval, ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP), 3(76), 1-23, 2020
- LBAN-IL: A Novel Method of High Discriminative Representation for Facial Expression Recognition, Journal of Neurocomputing, 432, 159-169, 2020
- Multi-scale Patch based Representation Feature Learning for Low-Resolution Face Recognition, Journal of applied Soft Computing, 2020
- Research on Singing Voice Detection Based on a Long-term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing, Electronics in MDPI open access journal, 2020
- Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition, IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2020
- Personalized Music Recommendation with Triplet Network, DEIM Forum 2019, No.F8-5, 5p., 201903
- Graph-Regularized Locality-Constrained Joint Dictionary and Residual Learning for Face Sketch Synthesis, IEEE Transactions on Image Processing, 201902
- Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data., IEEE Trans. Neural Netw. Learning Syst., 30(4), 1250-1258, 2019
- Incremental Re-identification by Cross-Direction and Cross-Ranking Adaption, accepted by IEEE Transactions on Multimedia 2019, 2019
- Face hallucination through differential evolution parameter map learning with facial structure prior, accepted by Journal of Information Sciences 2019, 2019
- Audio-Visual embedding for cross-modal music video retrieval through Supervised Deep CCA, accepted by IEEE ISM2018, 201812
- Deep Learning of Human Perception in Audio Event Classification, accepted by IEEE ISM2018, 201812
- Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing, accepted by the 2018 IEEE International Conference on Data Mining (ICDM'18)., 201810
- Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval, ACM Transaction on Multimedia Computing Communication and Applications (ACMTOMM), 201810
- Ensemble Super-Resolution with A Reference Dataset, accepted by IEEE Transactions on Cybernetics 2018, 201810
- Context-Patch Face Hallucination based on Thresholding Locality-constrained Representation and Reproducing Learning, IEEE Transactions on Cybernetics, 201807
- Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination, International Joint Conference on Artificial Intelligence (IJCAI) 2018, 201807
- Residual Learning for Face Sketch Synthesis, ICASSP2018, 1952-1956, 201804
- Video-based Person Re-identification Self Paced Weighting, accepted by AAAI 2018, 201802
- Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning., CoRR, abs/1809.00665, 2018
- Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data., CoRR, abs/1805.02997, 2018
- VenueNet: Fine-Grained Venue Discovery by Deep Correlation Learning, Proceedings - 2017 IEEE International Symposium on Multimedia, ISM 2017, 2017-, 288-291, 20171228
- Compact LBP and WLBP Descriptor with Magnitude and Direction Difference for Face Recognition, accepted by IEEE International Conference on Image Processing (ICIP) 2017., 201709
- Person Re-identification via Discrepancy Matrix and Matrix Metric, accepted by IEEE Transactions on Cybernetics 2017, 201709
- Statistical Inference of Gaussian-Laplace Distribution for Person Verification, accepted by ACM Multimedia (MM) 2017., 201708
- “Deep Multi-Label Hashing for Large-Scale Visual Search Based on Semantic Graph,” accepted by APWeb-WAIM Joint Conference on Web and Big Data 2017., 201707
- Context-Patch based Face Hallucination via Thresholding Locality-Constrained Representation with Reproducing Learning, top 3% papers accepted by IEEE International Conference on Multimedia and Expo(ICME) 2017., 201707
- JSFox: Integrating static and dynamic type analysis of javascript programs, Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering Companion, ICSE-C 2017, 256-258, 20170630
- A Query Refinement Framework for Xml Keyword Search,, Journal of World Wide Web (2017). doi:10.1007/s11280-017-0447-z, https://link.springer.com/article/10.1007/s11280-017-0447-z., 201703
- Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 14(3), 404-408, 201703
- TAICHI Distance for Person Re-identification, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017, 201703
- 3A: A Person Re-identification System via Attribute Augmentation and Aggregation, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017, 201703
- TAICHI Distance for Person Re-identification, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing 2017., 201702
- Using psychoacoustic models for sound analysis in music, ACM International Conference Proceeding Series, 08-10-, 1-7, 20161208
- Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing, IEEE TRANSACTIONS ON MULTIMEDIA, 18(12), 2553-2566, 201612
- Concept-level multimodal ranking of Flickr photo tags via recall based weighting, MMCommons 2016 - Proceedings of the 2016 ACM Workshop on the Multimedia COMMONS, co-located with ACM Multimedia 2016, 19-26, 20161016
- Leveraging multimodal information for event summarization and concept-level sentiment analysis, KNOWLEDGE-BASED SYSTEMS, 108, 102-109, 201609
- Fuzzy clustering of lecture videos based on topic modeling, Proceedings - International Workshop on Content-Based Multimedia Indexing, 2016-, 20160627
- Fuzzy Clustering of Lecture Videos Based on Topic Modeling, accepted by 14th Workshop on Content-Based Multimedia Indexing (CBMI), 2016., 201604
- Scale-adaptive Low-resolution Person Re-identification via Learning a Discriminating Surface, accepted by the 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016., 201604
- Zero-Shot Person Re-identification via Cross-View Consistency, IEEE TRANSACTIONS ON MULTIMEDIA, 18(2), 260-272, 201602
- Predicting User Preference Based on Matrix Factorization by Exploiting Music Attributes, Ninth International C* Conference on Computer Science & Software Engineering (C3S2E), 2016, 2016
- Videopedia: Lecture video recommendation for educational blogs using topic modeling, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516, 238-250, 2016
- Camera network based person re-identification by leveraging spatial-temporal constraint and multiple cameras relations, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516, 174-186, 2016
- NEWSMAN: Uploading videos over adaptive middleboxes to news servers in weak network infrastructures, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9516, 100-113, 2016
- On Generating Content-Oriented Geo Features for Sensor-Rich Outdoor Video Search, IEEE TRANSACTIONS ON MULTIMEDIA, 17(10), 1760-1772, 201510
- Efficient geo-fencing via hybrid hashing: A combination of bucket selection and in-bucket binary search, accepted by ACM Transactions on Spatial Algorithms and Systems, 1(5), 201503
- TRACE: Linguistic-based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts, 2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 217-220, 2015
- EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media, MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 185-188, 2015
- Adaptive Margin Nearest Neighbor for Person Re-Identification, ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2015, PT I, 9314, 75-84, 2015
- Multi-Level Fusion for Person Re-identification with Incomplete Marks, MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 1267-1270, 2015
- Social interactions over location-aware multimedia systems, Multimedia Data Mining and Analytics: Disruptive Innovation, 117-146, 20150101
- Empirical observation of user activities: Check-ins, venue photos and tips in foursquare, WISMM 2014 - Proceedings of the 1st International Workshop on Internet-Scale Multimedia Management, Workshop of MM 2014, 31-34, 20141107
- Emerging topics on personalized and localized multimedia information systems, MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, 1233-1234, 20141103
- ATLAS: Automatic temporal segmentation and annotation of lecture videos based on modelling transition time, MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, 209-212, 20141103
- A Probabilistic Associative Model for Segmenting Weakly Supervised Images, IEEE TRANSACTIONS ON IMAGE PROCESSING, 23(9), 4150-4159, 201409
- Student performance evaluation of multimodal learning via a vector space model, in Proc. WISMM in ACM MM, 27-30, 2014
- User preference-aware video generation based on modeling scene moods, in Proc. ACM MMSys’14, 156-159, 2014
- ADVISOR - Personalized video soundtrack recommendation by late fusion with heuristic rankings, in Proc. ACM international conference on Multimedia (ACM MM’14),, 607-616, 2014
- Scalable Content-Based Music Retrieval Using Chord Progression Histogram and Tree-Structure LSH, IEEE TRANSACTIONS ON MULTIMEDIA, 15(8), 1969-1981, 201312
- Edge-based locality sensitive hashing for efficient geo-fencing application, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, 566-569, 2013
- Social Interactions over geographic-aware multimedia systems, ACM international conference on Multimedia, 1115-1116, 2013
- Edge-based locality sensitive hashing for efficient geo-fencing application, in Proc. ACM SIGSPATIAL GIS, 586-589, 2013
- Query-document-dependent fusion: a case study of multimodal music retrieval, IEEE Transaction on Multimedia, 15(8), 1830-1842, 2013
- Automatic music soundtrack generation for outdoor videos from contextual sensor information, in Proc. ACM international conference on Multimedia, 1377-1378, 2012
- Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval, 2012 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 9-16, 2012
- Recommender system for MIR research community, in Proc. JCDL, 409-410, 2010
- Combing multi-probe histogram and order-statistics based LSH for scalable audio content retrieval, in Proc. ACM international conference on Multimedia, 381-390, 2010
- Local summarization and multi-level LSH for retrieving multi-variant audio tracks, in Proc. ACM international conference on Multimedia, 341-350, 2009
- COSIN: Content-based retrieval system for cover songs, MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops, 987-988, 2008
- Using Exact Locality Sensitive Mapping to Group and Detect Audio-Based Cover Songs, ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 302-+, 2008
- Similarity searching techniques in content-based audio retrieval via hashing, ADVANCES IN MULTIMEDIA MODELING, PT 1, 4351(Part I), 397-407, 2007
- Scalable motion style transfer with constrained diffusion generation, vol.38,(No.9,), 10234-10242, 202402
- Anchor-aware deep metric learning for audio-visual retrieval, ACM ICMR 2024, 211-219, 202406
- Syllable-level lyrics generation from melody exploiting character-level language model, EACL 2024, 1336-1346, 202403
- HKDSME: Heterogeneous knowledge distillation for semi-supervised singing melody extraction using harmonic supervision, ACM Multimedia (MM)[CORE A*], 545-553, 202410
- Generalized news event discovery via dynamic augmentation and entropy optimization, ACM Multimedia (MM) [CORE A*], 10018-10026, 202410
- A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning, IEEE Transactions on Multimedia, 7933-7945, 202403
- Semantic dependency network for lyrics generation from melody, Journal of Neural Computing and Applications, Vol.36, Issue 8, 4059-4069, 202403
- Multi-scale network with shared cross attention for audio-visual correlation learning, Journal of Neural Computing and Applications, Vol. 35, 20173-20187, 2023
- Controllable lyrics-to-melody generation, Journal of Neural Computing and Applications, Volume 35, 19805-19819, 202309
- Stripe-Transformer: deep stripe feature learning for music source separation, EURASIP Journal on Audio, Speech, and Music Processing, Vol. 2023, Issue 1, 20230112
- An efficient feature reuse distillation network for lightweight image super-resolution, COMPUTER VISION AND IMAGE UNDERSTANDING, 249, 202412
- Efficient Dual-Branch Information Interaction Network for Lightweight Image Super-Resolution, IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 73, 2024
- Controllable syllable-level lyrics generation from melody with prior attention, IEEE Transactions on Multimedia, Vol.26, 11083-11094, 202408