Numerical modeling of extrusion forming tools: improving its efficiency on heterogeneous parallel computers

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/36701

Título:	Numerical modeling of extrusion forming tools: improving its efficiency on heterogeneous parallel computers
Autor(es):	Pereira, David dos Santos
Orientador(es):	Proença, Alberto José Gonçalves, Nelson Daniel Ferreira
Palavras-chave:	HPC Heterogeneous Platforms Extrusion Forming Tools Parallel Computing Iterative Solvers Newtonian Fluids Generalized Newtonian Fluids
Data:	18-Dez-2014
Resumo(s):	Polymer processing usually requires several experimentation and calibration attempts to lead to a final result with the desired quality. As this results in large costs, software applications have been developed aiming to replace laboratory experimentation by computer based simulations and hence lower these costs. The focus of this dissertation was on one of these applications, the FlowCode, an application which helps the design of extrusion forming tools, applied to plastics processing or in the processing of other fluids. The original application had two versions of the code, one to run in a single-core CPU and the other for NVIDIA GPU devices. With the increasing use of heterogeneous platforms, many applications can now benefit and leverage the computational power of these platforms. As this requires some expertise, mostly to schedule tasks/functions and transfer the necessary data to the devices, several frameworks were developed to aid the development - with StarPU being the one with more international relevance, although other ones are emerging such as Dynamic Irregular Computing Environment (DICE). The main objectives of this dissertation were to improve the FlowCode, and to assess the use of one framework to develop an efficient heterogeneous version. Only the CPU version of the code was improved, by first applying techniques to the sequential version and parallelizing it afterwards using OpenMP on both multi-core CPU devices (Intel Xeon 12-core) and on many-core devices (Intel Xeon Phi 61-core). For the heterogeneous version, StarPU was chosen after studying both StarPU and DICE frameworks. Results show the parallel CPU version to be faster than the GPU one, for all input datasets. The GPU code is far from being efficient, requiring several improvements, so comparing the devices with each other would not be fair. The Xeon Phi version proves to be the faster one when no framework is used. For the StarPU version, several schedulers were tested to evaluate the faster one, leading to the most efficient to solve our problem. Executing the code on two GPU devices is 1.7 times faster than when executing the GPU version without the framework. Adding the CPU to the GPUs of the testing environment do not improve execution time with most schedulers due to the lack of available parallelism in the application. Globally, the StarPU version is the faster one followed by the Xeon Phi, CPU and GPU versions. O processamento de polímeros requer normalmente várias tentativas de experimentação e calibração de modo a que o resultado final tenha a qualidade pretendida. Como isto resulta em custos elevados, diversas aplicações foram desenvolvidas para substituir a parte de experimentação laboratorial por simulações por computador e consequentemente, reduzir esses custos. Este dissertação foca-se numa dessas aplicações, o FlowCode, uma aplicação de ajuda à conceção de ferramentas de extrusão aplicada no processamento de plásticos ou no processamento de outros tipos de fluidos. Esta aplicação inicial era composta por duas versões, uma executada sequencialmente num processador e outra executada em aceleradores computacionais NVIDIA GPU. Com o aumento da utilização de plataformas heterogéneas, muitas aplicações podem beneficiar do poder computacional destas plataformas. Como isto requer alguma experiência, principalmente para escalonar tarefas/funções e transferir os dados necessários para os aceleradores, várias frameworks foram desenvolvidas para ajudar ao desenvolvimento - sendo StarPU a framework com mais relevância internacional, embora outras estejam a surgir como a framework DICE. Os principais objetivos desta dissertação eram melhorar o FlowCode assim como avaliar a utilização de uma framework para desenvolver uma versão heterogénea eficiente. Apenas a versão CPU foi melhorada, primeiro aplicando técnicas na versão sequencial, e depois procedendo à paralelização usando OpenMP em CPUs multi-core (Intel Xeon 12-core) e aceleradores many-core (Intel Xeon Phi 61-core). Para a versão heterogénea, foi escolhido a framework StarPU depois de se ter feito um estudo das frameworks StarPU e DICE. Os resultados mostram que a versão CPU paralela é mais rápida que a GPU em todos os casos testados. O código GPU está longe de ser eficiente, necessitando diversas melhorias. Portanto, uma comparação entre CPUs, GPUs e Xeon Phi’s não seria justa. A versão Xeon Phi revela-se ser a mais rápida quando não é usada nenhuma framework. Para a versão StarPU, vários escalonadores foram testados para avaliar o mais rápido, levando ao mais eficiente para resolver o nosso problema. Executar o código em dois GPUs é 1.7 vezes mais rápido do que executar para um GPU sem framework em um dos casos testados. Adicionar o CPU aos GPUs do ambiente de teste não melhora o tempo de execução para a maioria dos escalonadores devido à falta de paralelismo disponível. Globalmente, a versão StarPU é a mais rápida seguida das versões Xeon Phi, CPU, e GPU.
Tipo:	Dissertação de mestrado
Descrição:	Dissertação de mestrado em Engenharia Informática
URI:	https://hdl.handle.net/1822/36701
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Dissertações de Mestrado

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
eeum_di_dissertacao_pg22821.pdf		2,34 MB	Adobe PDF	Ver/Abrir

Ver registo completo Sugerir correção Estatísticas