Yasuaki Itou

Last Updated :2021/04/06

Affiliations, Positions
Graduate School of Advanced Science and Engineering, Associate Professor
E-mail
yasuakihiroshima-u.ac.jp

Basic Information

Academic Degrees

  • Doctor of Engineering, Hiroshima University
  • Master of Information Science, Japan Advanced Institute of Science and Technology\, Hokuriku

Research Fields

  • Informatics;Computing Technologies;Software

Research Keywords

  • Parallel processing
  • Reconfigurable computing
  • GPGPU
  • FPGA

Educational Activity

Course in Charge

  1. 2021, Undergraduate Education, First Semester, Programming III
  2. 2021, Undergraduate Education, 3Term, Operating Systems
  3. 2021, Graduate Education (Doctoral Program) , Academic Year, Special Study on Informatics and Data Science
  4. 2021, Graduate Education (Master's Program) , 1Term, Special Exercises on Informatics and Data Science A
  5. 2021, Graduate Education (Master's Program) , 2Term, Special Exercises on Informatics and Data Science A
  6. 2021, Graduate Education (Master's Program) , 3Term, Special Exercises on Informatics and Data Science B
  7. 2021, Graduate Education (Master's Program) , 4Term, Special Exercises on Informatics and Data Science B
  8. 2021, Graduate Education (Master's Program) , Academic Year, Special Study on Informatics and Data Science
  9. 2021, Graduate Education (Master's Program) , 1Term, Embedded System

Research Activities

Academic Papers

  1. FM screening by the local exhaustive search, with hardware acceleration, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 16(1), 89-104, 200502
  2. An energy efficient leader election protocol for radio network with a single transceiver, IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, E89A(5), 1355-1361, 200612
  3. Efficient hardware algorithms for N choose K counters using the bitonic merger, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 18(3), 517-528, 200706
  4. A NEW FM SCREENING METHOD TO GENERATE CLUSTER-DOT BINARY IMAGES USING THE LOCAL EXHAUSTIVE SEARCH WITH FPGA ACCELERATION, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 19(6), 1373-1386, 200812
  5. LOW-LATENCY CONNECTED COMPONENT LABELING USING AN FPGA, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 21(3), 405-425, 201006
  6. Efficient Exhaustive Verification of the Collatz Conjecture using DSP blocks of Xilinx FPGAs, International Journal of Networking and Computing, 1(1), 49-62, 201101
  7. AN EFFICIENT PARALLEL SORTING COMPATIBLE WITH THE STANDARD QSORT, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 22(5), 1057-1071, 201108
  8. A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E94D(12), 2378-2388, 201112
  9. A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E96D(12), 2596-2603, 201312
  10. Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E96D(12), 2617-2625, 201312
  11. A Classification Processor for a Support Vector Machine with embedded DSP slices and block RAMs in the FPGA, in Proc. of the IEEE 7th International Symposium on Embedded Multicore SoCs (MCSoC), 91-96, 201309
  12. A Flexible-Length-Arithmetic Processor Using Embedded DSP Slices and Block RAMs in FPGAs, in Proc. of International Symposium on Computing and Networking (CANDAR), 75-84, 201312
  13. Accelerating computation of Euclidean distance map using the GPU with Efficient memory access, International Journal of Parallel, Emergent and Distributed Systems, 28(5), 383-406, 2013
  14. An Efficient Implementation of the Hough Transform using DSP slices and block RAMs on the FPGA, in Proc. of the IEEE 7th International Symposium on Embedded Multicore SoCs (MCSoC), 85-90, 201309
  15. An FPGA implementation for neural networks with the FDFM processor core approach, International Journal of Parallel, Emergent and Distributed Systems, 28(4), 308-320, 2013
  16. An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation, in Proc. of 2013 International Conference on Parallel Processing (ICPP), 1-10, 20131001
  17. ASCII Art Generation using the Local Exhaustive Search on the GPU, in Proc. of International Symposium on Computing and Networking (CANDAR), 194-200, 201312
  18. Efficient Hough Transform on the FPGA using DSP slices and Block RAMs, in Proc. of Workshop on Advances in Parallel and Distributed Computational Models (APDCM), 771-778, 20130501
  19. Implementations of the Hough Transform on the Embedded Multicore Processors, International Journal of Networking and Computing (IJNC), 4(1), 174-188, 20140101
  20. Template Matching using DSP slices on the FPGA, in Proc. of International Symposium on Computing and Networking (CANDAR), 338-344, 201312
  21. The Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation, in Proc. of the IEEE 7th International Symposium on Embedded Multicore SoCs (MCSoC), 79-84, 201309
  22. The Random Address Shift to Reduce the Memory Access Congestion on the Discrete Memory Machine, in Proc. of International Symposium on Computing and Networking (CANDAR), 95-103, 201312
  23. TinyCSE: Tiny Computer System for Education, in Proc. of International Symposium on Computing and Networking (CANDAR), 639-641, 201312
  24. Offline Permutation on the CUDA-enabled GPU, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E97D(12), 3052-3062, 201412
  25. An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E97D(12), 3063-3071, 201412
  26. An RSA Encryption Hardware Algorithm using a Single DSP Block and a Single Block RAM on the FPGA, International Journal of Networking and Computing, 1(2), 277-289, 201107
  27. Accelerating the CKY parsing using FPGAs, IEICE Transactions on Information and Systems, E86-D(5), 803-810, 201312
  28. Instance-Specific Solutions to Accelerate the CKY Parsing for Large Context-free Grammars, International Journal on Foundations of Computer Science, 15(2), 403-416, 200404
  29. Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs, International Journal of Networking and Computing, 1(2), 260-276, 201107
  30. The Parallel FDFM Processor Core Approach for CRT-based RSA Decryption, International Journal of Networking and Computing, 2(1), 79-96, 201201
  31. An Algorithm to Obtain Circuits with Synchronous RAMs, Journal of Communication and Computer, 9(5), 547-559, 201212
  32. A Rewriting Approach to Replace Asynchronous ROMs with Synchronous Ones for the Circuits with Cycles, International Journal of Networking and Computing, 2(1), 269-290, 201207
  33. Accelerating ant colony optimisation for the travelling salesman problem on the GPU, International Journal of Parallel, Emergent and Distributed Systems, 29(4), 401-420, 20140801
  34. Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation, Proc. of International Parallel and Distributed Processing Symposium Workshops, 586-595, 20140519
  35. An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA, Proc. of International Parallel and Distributed Processing Symposium Workshops, 762-770, 20140519
  36. C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm, Proc. of International Conference on Algorithms and Architectures for Parallel Processing, 178-191, 201408
  37. GPU-accelerated Verification of the Collatz Conjecture, Proc. of International Conference on Algorithms and Architectures for Parallel Processing, 483-496, 201408
  38. A GPU Implementation of Clipping-Free Halftoning using the Direct Binary Search, Proc. of International Conference on Algorithms and Architectures for Parallel Processing, 57-70, 201408
  39. Random Address Permute Shift Technique for the Shared Memory on GPUs, Proc. of International Conference on Parallel Processing Workshops, 429-483, 201409
  40. Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations, Proc. of International Conference on Parallel Processing, 251-250, 201409
  41. Thorough Evaluation of GPU Shared Memory Load and Store Instructions, in Proc. of International Symposium on Computing and Networking, 614-616, 201412
  42. An Efficient Implementation of the One-Dimensional Hough Transform Algorithm for Circle Detection on the FPGA, in Proc. of International Symposium on Computing and Networking, 447-452, 201412
  43. Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU implementation, Proc. of International Conference on Parallel, Distributed and Network-Based Processing, 626-634, 201503
  44. A character art generator using the local exhaustive search, with GPU acceleration, INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 31(1), 47-63, 201601
  45. Bulk execution of Euclidean algorithms on the CUDA-enabled GPU, International Journal of Networking and Computing, 6(1), 42-63, 201601
  46. Bulk GCD Computation Using a GPU to Break Weak RSA Keys, Proc. of International Parallel and Distributed Processing Symposium Workshops, 385-394, 201505
  47. GPU-accelerated Digital Halftoning by the Local Exhaustive Search, Proc. of the 14th International Symposium on Parallel and Distributed Computing, 82-87, 201506
  48. Optimal Parallel Hardware K-Sorter and TopK-Sorter, with FPGA implementations, Proc. of the 14th International Symposium on Parallel and Distributed Computing, 138-147, 201506
  49. Parallel FDFM Approach for Computing GCDs Using the FPGA, Proc. of 11th International Conference of Parallel Processing and Applied Mathematics, 238-247, 201509
  50. A Parallel Algorithm for LZW decompression, with GPU implementation, Proc. of 11th International Conference of Parallel Processing and Applied Mathematics, 228-237, 201509
  51. Fast LZW compression using a GPU, Proc. of International Symposium on Computing and Networking, 303-308, 201512
  52. A Warp-synchronous Implementation for Multiple-length Multiplication on the GPU, Proc. of International Symposium on Computing and Networking, 96-102, 201512
  53. A Fast Approximate String Matching Algorithm on GPU, Proc. International Symposium on Computing and Networking, 188-192, 201512
  54. Parallelization Techniques for Error Diffusion with GPU Implementations, Proc. of International Symposium on Computing and Networking, 30-39, 201512
  55. A flexible-length-arithmetic processor based on FDFM approach in FPGAs, Proc. of International Symposium on Computing and Networking, 364-370, 201512
  56. Efficient GPU implementations for the Conway's Game of Life, Proc. of International Symposium on Computing and Networking, 11-20, 201512
  57. Accelerating digital halftoning using the local exhaustive search on the GPU, Concurrency and Computation: Practice and Experience, Web(Web), Web-Web, 20160212
  58. An FPGA Implementation for a Flexible-Length-Arithmetic Processor Employing the FDFM Processor Core Approach, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E99D(12), 2901-2910, 201612
  59. Fully Parallelized LZW Decompression for CUDA-Enabled GPUs, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E99D(12), 2986-2994, 201612
  60. A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E99D(12), 2995-3003, 201612
  61. GPU-Accelerated Bulk Execution of Multiple-Length Multiplication with Warp-Synchronous Programming Technique, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E99D(12), 3004-3012, 201612
  62. Fast Simulation of Conway's Game of Life Using Bitwise Parallel Bulk Computation on a GPU, INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 27(8), 981-1003, 201612
  63. GPU-accelerated Exhaustive Verification of the Collatz Conjecture, International Journal of Networking and Computing, 7(1), 69-85, 201701
  64. Efficient Implementation of FDFM Approach for Euclidean Algorithms on the FPGA, International Journal of Networking and Computing, 6(2), 420-435, 201607
  65. Light Loss-Less Data Compression, with GPU implementation, Proc. of the 16th International Conference on Algorithms and Architectures for Parallel Processing, 281-294, 201612
  66. An Efficient Implementation of LZW Compression in the FPGA, Proc. of the 16th International Conference on Algorithms and Architectures for Parallel Processing, 512-520, 201612
  67. Accelerating Ant Colony Optimization for the Vertex Coloring Problem on the GPU, Proc. of International Symposium on Computing and Networking, 469-475, 201612
  68. A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU, Proc. of International Symposium on Computing and Networking, 483-489, 201612
  69. A hardware sorter for almost sorted sequences, with FPGA implementations, Proc. of International Symposium on Computing and Networking, 565-571, 201612
  70. An Evaluation of the Parallella Architecture for the Convex Hull Computation, Proc. of International Symposium on Computing and Networking, 704-706, 201612
  71. GPU-Accelerated Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices, Proc. of International Symposium on Computing and Networking, 490-496, 201612
  72. Bitwise Parallel Bulk Computation on the GPU, with Application to the CKY Parsing for Context-Free Grammars, Proc. of International Parallel and Distributed Processing Symposium Workshops, 589-598, 201605
  73. An Efficient Implementation of LZW Decompression in the FPGA, Proc. of International Parallel and Distributed Processing Symposium Workshops, 599-607, 201605
  74. C2CU: a CUDA C program generator for bulk execution of a sequential algorithm, CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 29(17), e4022, 20170910
  75. Adaptive loss-less data compression method optimized for GPU decompression, CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 29(24), e4283, 20171225
  76. An Efficient GPU Implementation of CKY Parsing Using the Bitwise Parallel Bulk Computation Technique, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E100D(12), 2857-2865, 201712
  77. Almost optimal column-wise prefix-sum computation on the GPU, JOURNAL OF SUPERCOMPUTING, 74(4), 1510-1521, 201804
  78. An Efficient GPU Implementation of Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices, International Journal of Networking and Computing, 7(2), 227-247, 201707
  79. Single Kernel Soft Synchronization Technique for Task Arrays on CUDA-enabled GPUs, Proc. of International Symposium on Computing and Networking, 11-20, 201711
  80. A Square Pointillism Image Generation, and its GPU Acceleration, Proc. of International Symposium on Computing and Networking, 38-47, 201711
  81. A Hybrid Architecture for the Approximate String Matching on an FPGA, Proc. of International Symposium on Computing and Networking, 48-57, 201711
  82. A GPU Implementation of Bulk Execution of the Dynamic Programming for the Optimal Polygon Triangulation, Proc. of 12th International Conference of Parallel Processing and Applied Mathematics, 314-323, 201709
  83. Almost Optimal Column-wise Prefix-sum Computation on the GPU, Proc. of 12th International Conference of Parallel Processing and Applied Mathematics, 224-233, 201709
  84. Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU implementations, Proc. of 46th International Conference on Parallel Processing, 362-371, 201708
  85. Photomosaic Generation by Rearranging Subimages, with GPU Acceleration, Proc. of International Parallel and Distributed Processing Symposium Workshops, 942-951, 201705
  86. Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU, Proc. of International Parallel and Distributed Processing Symposium Workshops, 932-941, 201705
  87. Efficient Byte Stream Pattern Test using Bloom Filter with Rolling Hash Functions on the FPGA, Proc. of International Symposium on Computing and Networking, 66-75, 201811
  88. A Prefix-Sum-Based Rabin-Karp Implementation for Multiple Pattern Matching on GPGPU, Proc. of International Symposium on Computing and Networking, 139-145, 201811
  89. Tile Art Image Generation Using Conditional Generative Adversarial Networks, Proc. of International Symposium on Computing and Networking Workshops, 209-215, 201811
  90. An Optimal Parallel Algorithm for Computing the Summed Area Table on the GPU, Proc. of International Parallel and Distributed Processing Symposium Workshops, 763-772, 201811