2025
|
Rockenbach, Dinei A.; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz Gustavo GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism Journal Article doi In: Computer Standards & Interfaces, vol. 92, pp. 103922, 2025. @article{ROCKENBACH:GSParLib:CSI:25,
title = {GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism},
author = {Dinei A. Rockenbach and Gabriell Araujo and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.csi.2024.103922},
doi = {10.1016/j.csi.2024.103922},
year = {2025},
date = {2025-03-01},
urldate = {2025-03-01},
journal = {Computer Standards & Interfaces},
volume = {92},
pages = {103922},
publisher = {Elsevier},
abstract = {The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving and processing data from streaming data producers such as cameras and sensors. Nonetheless, programming GPUs is challenging because it requires deep knowledge of many-core programming, mechanisms and optimizations for GPUs. Current GPU programming standards do not target stream processing and present programmability and code portability limitations. Among our main scientific contributions resides GSParLib, a C++ multi-level programming interface unifying CUDA and OpenCL for GPU processing on stream and data parallelism with negligible performance losses compared to manual implementations; GSParLib is organized in two layers: one for general-purpose computing and another for high-level structured programming based on parallel patterns; a methodology to provide unified and driver agnostic interfaces minimizing performance losses; a set of parallelism strategies and optimizations for GPU processing targeting stream and data parallelism; and new experiments covering GPU performance on applications exposing stream and data parallelism.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving and processing data from streaming data producers such as cameras and sensors. Nonetheless, programming GPUs is challenging because it requires deep knowledge of many-core programming, mechanisms and optimizations for GPUs. Current GPU programming standards do not target stream processing and present programmability and code portability limitations. Among our main scientific contributions resides GSParLib, a C++ multi-level programming interface unifying CUDA and OpenCL for GPU processing on stream and data parallelism with negligible performance losses compared to manual implementations; GSParLib is organized in two layers: one for general-purpose computing and another for high-level structured programming based on parallel patterns; a methodology to provide unified and driver agnostic interfaces minimizing performance losses; a set of parallelism strategies and optimizations for GPU processing targeting stream and data parallelism; and new experiments covering GPU performance on applications exposing stream and data parallelism. |
2024
|
Hoffmann, Renato B.; Griebler, Dalvan; Righi, Rodrigo Rosa; Fernandes, Luiz G. Benchmarking parallel programming for single-board computers Journal Article doi In: Future Generation Computer Systems, vol. 161, pp. 119-134, 2024. @article{HOFFMANN:single-board-computers:FGCS:24,
title = {Benchmarking parallel programming for single-board computers},
author = {Renato B. Hoffmann and Dalvan Griebler and Rodrigo Rosa Righi and Luiz G. Fernandes},
url = {https://doi.org/10.1016/j.future.2024.07.003},
doi = {10.1016/j.future.2024.07.003},
year = {2024},
date = {2024-12-01},
urldate = {2024-12-01},
journal = {Future Generation Computer Systems},
volume = {161},
pages = {119-134},
publisher = {Elsevier},
abstract = {Within the computing continuum, SBCs (single-board computers) are essential in the Edge and Fog, with many featuring multiple processing cores and GPU accelerators. In this way, parallel computing plays a crucial role in enabling the full computational potential of SBCs. However, selecting the best-suited solution in this context is inherently complex due to the intricate interplay between PPI (parallel programming interface) strategies, SBC architectural characteristics, and application characteristics and constraints. To our knowledge, no solution presents a combined discussion of these three aspects. To tackle this problem, this article aims to provide a benchmark of the best-suited parallelism PPIs given a set of hardware and application characteristics and requirements. Compared to existing benchmarks, we introduce new metrics, additional applications, various parallelism interfaces, and extra hardware devices. Therefore, our contributions are the methodology to benchmark parallelism on SBCs and the characterization of the best-performing parallelism PPIs and strategies for given situations. We are confident that parallel computing will be mainstream to process edge and fog computing; thus, our solution provides the first insights regarding what kind of application and parallel programming interface is the most suited for a particular SBC hardware.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Within the computing continuum, SBCs (single-board computers) are essential in the Edge and Fog, with many featuring multiple processing cores and GPU accelerators. In this way, parallel computing plays a crucial role in enabling the full computational potential of SBCs. However, selecting the best-suited solution in this context is inherently complex due to the intricate interplay between PPI (parallel programming interface) strategies, SBC architectural characteristics, and application characteristics and constraints. To our knowledge, no solution presents a combined discussion of these three aspects. To tackle this problem, this article aims to provide a benchmark of the best-suited parallelism PPIs given a set of hardware and application characteristics and requirements. Compared to existing benchmarks, we introduce new metrics, additional applications, various parallelism interfaces, and extra hardware devices. Therefore, our contributions are the methodology to benchmark parallelism on SBCs and the characterization of the best-performing parallelism PPIs and strategies for given situations. We are confident that parallel computing will be mainstream to process edge and fog computing; thus, our solution provides the first insights regarding what kind of application and parallel programming interface is the most suited for a particular SBC hardware. |
Guder, Larissa; Aires, João Paulo; Griebler, Dalvan Dimensional Speech Emotion Recognition: a Bimodal Approach Inproceedings doi In: Anais Estendidos do XXX Simpósio Brasileiro de Sistemas Multimídia e Web, pp. 5-6, SBC, Juiz de Fora, Brasil, 2024. @inproceedings{GUDER:WEBMEDIA:24,
title = {Dimensional Speech Emotion Recognition: a Bimodal Approach},
author = {Larissa Guder and João Paulo Aires and Dalvan Griebler},
url = {https://doi.org/10.5753/webmedia_estendido.2024.244402},
doi = {10.5753/webmedia_estendido.2024.244402},
year = {2024},
date = {2024-10-01},
booktitle = {Anais Estendidos do XXX Simpósio Brasileiro de Sistemas Multimídia e Web},
pages = {5-6},
publisher = {SBC},
address = {Juiz de Fora, Brasil},
abstract = {Considering the human-machine relationship, affective computing aims to allow computers to recognize or express emotions. Speech Emotion Recognition is a task from affective computing that aims to recognize emotions in an audio utterance. The most common way to predict emotions from the speech is using pre-determined classes in the offline mode. In that way, emotion recognition is restricted to the number of classes. To avoid this restriction, dimensional emotion recognition uses dimensions such as valence, arousal, and dominance, which can represent emotions with higher granularity. Existing approaches propose using textual information to improve results for the valence dimension. Although recent efforts have tried to improve results on speech emotion recognition to predict emotion dimensions, they do not consider real-world scenarios, where processing the input in a short time is necessary. Considering these aspects, this work provides the first step towards creating a bimodal approach for Dimensional Speech Emotion Recognition in streaming. Our approach combines sentence and audio representations as input to a recurrent neural network that performs speech-emotion recognition. We evaluate different methods for creating audio and text representations, as well as automatic speech recognition techniques. Our best results achieve 0.5915 of CCC for arousal, 0.4165 for valence, and 0.5899 for dominance in the IEMOCAP dataset.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Considering the human-machine relationship, affective computing aims to allow computers to recognize or express emotions. Speech Emotion Recognition is a task from affective computing that aims to recognize emotions in an audio utterance. The most common way to predict emotions from the speech is using pre-determined classes in the offline mode. In that way, emotion recognition is restricted to the number of classes. To avoid this restriction, dimensional emotion recognition uses dimensions such as valence, arousal, and dominance, which can represent emotions with higher granularity. Existing approaches propose using textual information to improve results for the valence dimension. Although recent efforts have tried to improve results on speech emotion recognition to predict emotion dimensions, they do not consider real-world scenarios, where processing the input in a short time is necessary. Considering these aspects, this work provides the first step towards creating a bimodal approach for Dimensional Speech Emotion Recognition in streaming. Our approach combines sentence and audio representations as input to a recurrent neural network that performs speech-emotion recognition. We evaluate different methods for creating audio and text representations, as well as automatic speech recognition techniques. Our best results achieve 0.5915 of CCC for arousal, 0.4165 for valence, and 0.5899 for dominance in the IEMOCAP dataset. |
Vogel, Adriano; Danelutto, Marco; Torquati, Massimo; Griebler, Dalvan; Fernandes, Luiz Gustavo Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores Journal Article doi In: The Journal of Supercomputing, vol. 80, no. 15, pp. 22213-22244, 2024. @article{VOGEL:Supercomputing:24,
title = {Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores},
author = {Adriano Vogel and Marco Danelutto and Massimo Torquati and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s11227-024-06191-w},
doi = {10.1007/s11227-024-06191-w},
year = {2024},
date = {2024-10-01},
urldate = {2024-10-01},
journal = {The Journal of Supercomputing},
volume = {80},
number = {15},
pages = {22213-22244},
publisher = {Springer},
abstract = {Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runtime responsiveness. In our research, we aim to explore and assess the possible abstractions attainable through the transparent management of parallel executions by self-adaptation. Our primary objectives are to expand the adaptation space to better reflect real-world applications and assess the potential for self-adaptation to enhance efficiency. We provide the following scientific contributions: (I) A conceptual framework to improve the designing of self-adaptation; (II) A new decision-making strategy for applications with multiple parallel stages; (III) A comprehensive evaluation of the proposed decision-making strategy compared to the state-of-the-art. The results demonstrate that the proposed conceptual framework can help design and implement self-adaptive strategies that are more modular and reusable. The proposed decision-making strategy provides significant gains in accuracy compared to the state-of-the-art, increasing the parallel applications' performance and efficiency.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runtime responsiveness. In our research, we aim to explore and assess the possible abstractions attainable through the transparent management of parallel executions by self-adaptation. Our primary objectives are to expand the adaptation space to better reflect real-world applications and assess the potential for self-adaptation to enhance efficiency. We provide the following scientific contributions: (I) A conceptual framework to improve the designing of self-adaptation; (II) A new decision-making strategy for applications with multiple parallel stages; (III) A comprehensive evaluation of the proposed decision-making strategy compared to the state-of-the-art. The results demonstrate that the proposed conceptual framework can help design and implement self-adaptive strategies that are more modular and reusable. The proposed decision-making strategy provides significant gains in accuracy compared to the state-of-the-art, increasing the parallel applications' performance and efficiency. |
Faé, Leonardo; Griebler, Dalvan An internal domain-specific language for expressing linear pipelines: a proof-of-concept with MPI in Rust Inproceedings doi In: Anais do XXVIII Simpósio Brasileiro de Linguagens de Programação, pp. 81-90, SBC, Curitiba/PR, 2024. @inproceedings{FAE:SBLP:24,
title = {An internal domain-specific language for expressing linear pipelines: a proof-of-concept with MPI in Rust},
author = {Leonardo Faé and Dalvan Griebler},
url = {https://doi.org/10.5753/sblp.2024.3691},
doi = {10.5753/sblp.2024.3691},
year = {2024},
date = {2024-09-01},
booktitle = {Anais do XXVIII Simpósio Brasileiro de Linguagens de Programação},
pages = {81-90},
publisher = {SBC},
address = {Curitiba/PR},
series = {SBLP'24},
abstract = {Parallel computation is necessary in order to process massive volumes of data in a timely manner. There are many parallel programming interfaces and environments, each with their own idiosyncrasies. This, alongside non-deterministic errors, make parallel programs notoriously challenging to write. Great effort has been put forth to make parallel programming for several environments easier. In this work, we propose a DSL for Rust, using the language’s source-to-source transformation facilities, that allows for automatic code generation for distributed environments that support the Message Passing Interface (MPI). Our DSL simplifies MPI’s quirks, allowing the programmer to focus almost exclusively on the computation at hand. Performance experiments show nearly or no runtime difference between our abstraction and manually written MPI code while resulting in less than half the lines of code. More elaborate code complexity metrics (Halstead) estimate from 4.5 to 14.7 times lower effort for expressing parallelism.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Parallel computation is necessary in order to process massive volumes of data in a timely manner. There are many parallel programming interfaces and environments, each with their own idiosyncrasies. This, alongside non-deterministic errors, make parallel programs notoriously challenging to write. Great effort has been put forth to make parallel programming for several environments easier. In this work, we propose a DSL for Rust, using the language’s source-to-source transformation facilities, that allows for automatic code generation for distributed environments that support the Message Passing Interface (MPI). Our DSL simplifies MPI’s quirks, allowing the programmer to focus almost exclusively on the computation at hand. Performance experiments show nearly or no runtime difference between our abstraction and manually written MPI code while resulting in less than half the lines of code. More elaborate code complexity metrics (Halstead) estimate from 4.5 to 14.7 times lower effort for expressing parallelism. |