• Home
  • Chat IA
  • Guru IA
  • Tutores
  • Central de ajuda
Home
Chat IA
Guru IA
Tutores

·

Ciência da Computação ·

Arquitetura de Computadores

Envie sua pergunta para a IA e receba a resposta na hora

Recomendado para você

Mini Projeto de Computação Paralela

10

Mini Projeto de Computação Paralela

Arquitetura de Computadores

MACKENZIE

Trabalho Assembly

9

Trabalho Assembly

Arquitetura de Computadores

UFPB

Arquitetura de Computadores

11

Arquitetura de Computadores

Arquitetura de Computadores

UFPI

Estudo do Padrão IEEE 754 para Aritmética de Ponto Flutuante em MIPS

28

Estudo do Padrão IEEE 754 para Aritmética de Ponto Flutuante em MIPS

Arquitetura de Computadores

UECE

Simulador MARS - Instrucoes para Calculo de RAID 0 e 1

1

Simulador MARS - Instrucoes para Calculo de RAID 0 e 1

Arquitetura de Computadores

FACAPE

Entrada e Saída com Win32 e MASM32

12

Entrada e Saída com Win32 e MASM32

Arquitetura de Computadores

UNIPE

Guia Iniciante Arduino - Multilógica Shop

150

Guia Iniciante Arduino - Multilógica Shop

Arquitetura de Computadores

UFG

Arquitetura do Conjunto de Instruções Assembly

22

Arquitetura do Conjunto de Instruções Assembly

Arquitetura de Computadores

UNIPE

Trabalho de Implementação com Avx e Avx2

2

Trabalho de Implementação com Avx e Avx2

Arquitetura de Computadores

UFLA

Classes dos Computadores Mordernos

7

Classes dos Computadores Mordernos

Arquitetura de Computadores

UNIP

Texto de pré-visualização

19th Marathon of Parallel Programming SSCAD 2024 Calebe P Bianchini1 and Helio C Guardia2 1Mackenzie Presbyterian University 2Universidade Federal de Sao Carlos Rules for Remote Contest For all problems read carefully the input and output session For all problems a sequen tial implementation is given and it is against the output of those implementations that the output of your programs will be compared to decide if your implementation is correct You can modify the program in any way you see fit except when the problem descrip tion states otherwise You must upload a compressed file zip with your source code the Makefile and an execution script The script must have the name of the problem You can submit as many solutions to a problem as you want Only the last submission will be considered The Makefile must have the rule all which will be used to compile your source code The execution script runs your solution the way you design it it will be inspected not to corrupt the target machine The execution time of your program will be measured running it with time program and taking the real CPU time given Each program will be executed at least three times with the same input and the mean time will be taken into account The sequential program given will be measured the same way You will earn points in each problem correspond ing to the division of the sequential time by the time of your program speedup The team with the most points at the end of the marathon will be declared the winner This problem set contains 4 problems pages are numbered from 01 to 07 General Information Compilation You should use CC or CXX inside your Makefile Be careful when redefining them There is a simple Makefile inside you problem package that you can modify Example FLAGSO3 EXECsum CXXg all EXEC EXEC CXX FLAGS EXECcpp c o EXECo CXX FLAGS EXECo o EXEC Each judge machine has its group of compilers See them below and choose well when writing your Makefile The compiler that is tagged as default is predefined in CC and CXX variables machine compiler command host GCC 940 default C gcc C g MPI Open MPI 411 default C mpicc C mpic GCC 940 C gcc C g gpu NVidia CUDA 1201 default C nvcc C nvcc GCC 831 C gcc C g Submitting General information You must have an execution script that has the same name of the problem This script runs your solution the way you design it There is a simple script inside you problem package that should be modified Example binbash This script runs a generic Problem A Using 32 threads and OpenMP export OMPNUMTHREADS32 OMPNUMTHREADS32 sum Submitting MPI If you are planning to submit an MPI solution you should compile using mpiccmpic The script must call mpirunmpiexec with the correct number of processes max 160 binbash This script runs a generic Problem A Using MPI in the entire cluster 4 nodes mpirun np 4 sum Comparing times results In your personal machine measure the execution time of your solution using time pro gram Add inputoutput redirection when collecting time Use diff program to compare the original and your solution results Example time p A originalinputtxt myoutputtxt real 494 user 008 sys 156 diff myoutputtxt originaloutputtxt Do not measure time and do not add inputoutput redirection when submitting your so lution the autojudge system is prepared to collect your time and compare the results Problem A Optimizing Media Storage on Blu Rays Leonardo M Takuno Anyone who has ever recorded a variety of media onto CDs DVDs or Blu Rays knows that optimizing the use of space on each disk was never an easy task Sometimes it was necessary to adjust and reorganize movies photos and music to fit everything properly onto the disks Isabela needs to record a mix of movies photos and music onto her Blu Rays She has a collection of digital files on her computer and would like to distribute them across several Blu Rays She knows that each Blu Ray has a maximum storage capacity and is aware of the size of each type of file However she is having trouble deciding which files to put on which Blu Ray to maximize the use of each disk Write a parallel version of the this solution Input The first line of input consists of two positive integers N and K which represent the total number of files movies photos and music on Isabelas computer and the number of Blu Rays she has The second line of input consists of N positive integers which represent the size in gigabytes of each file The last line of input consists of K positive integers which represent the maximum storage capacity in gigabytes of each Blu Ray No file is larger than 50 gigabytes and no Blu Ray has a capacity greater than 50 gigabytes The input must be read from the standard input Output Print a single line containing the maximum total number of gigabytes of files that can be recorded on the Blu Rays The output must be written to the standard output Example Sample Input 1 Sample output 1 8 3 18 37 46 37 47 1 10 46 17 1 7 17 1 Problem B Traveling Salesman Problem Solver Lucas S Rosa Alfredo Goldman The Traveling Salesman Problem TSP is a classic optimization challenge in computer science and operations research It asks the question Given a list of cities and the distances between each pair of cities what is the shortest possible route that visits each city exactly once and returns to the origin city This problem is of great importance in various realworld applications including 1 Logistics and transportation planning 2 Circuit board drilling in manufacturing 3 DNA sequencing in bioinformatics 4 Computer wiring 5 Delivery route optimization The TSP is known to be NPhard meaning that as the number of cities increases the time required to find the optimal solution grows exponentially This makes it an excellent candidate for exploring heuristic algorithms This solution implements a TSP solver using the Shotgun Hill Climbing algorithm Heres a highlevel overview of its functionality 1 It reads a distance matrix from a CSV file representing the distances between cities 2 It uses a shotgun approach which involves multiple restarts of a hill climbing search 3 For each restart It generates a random initial tour It then repeatedly attempts to improve this tour using 2opt swaps If no improvement is found it moves to the next restart 4 After all restarts it returns the best tour found across all attempts The algorithm balances between exploration through multiple random restarts and ex ploitation through hill climbing aiming to find a good approximate solution to the TSP in a reasonable amount of time Write a parallel version of the this solution Input The first line of the input contains 1 the umber of iterations for each hill climbing attempt 2 the number of restarts shotgun attempts 3 a seed for the random number generator The following lines contains the distance matrix describing the distances of the cities The input must be read from the standard input 2 Output The program outputs 1 The best tour found represented as a sequence of city indices 2 The total length of this tour In this output the sequence of numbers represents the order in which cities are visited in the best tour found and the tour length represents the total distance traveled in this tour The output must be written to the standard output Example Sample input 1 Sample output 1 2000 20 17 Best tour found 0 2 1 010203 Tour length 03 000100 020001 3 Problem C Zeros of Riemann Zeta Function Luiz A Steffenel Zeros of the Riemann zeta function denoted as ζs are the values of the complex vari able s for which ζs 0 There are two types of zeros trivial and nontrivial Trivial zeros occur at negative even integers s 2 4 6 These are considered trivial because their existence is relatively easy to prove using the functional equation of the zeta function Nontrivial zeros are the complex values of s for which ζs 0 and have real part be tween 0 and 1 They are called nontrivial because their distribution is less understood and their study is important for understanding prime numbers and related objects in num ber theory Riemann Hypothesis is one of the most famous and longstanding unsolved problems in mathematics proposed by Bernhard Riemann in 1859 It conjectures that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 12 The code related to this challenge computes the number of zeros on the critical line of the Zeta function The objective is not to compute the zeros we count them to check that they are on the Riemann Line The exercise is to sample a region on the critical line to count how many times the function changes sign so that there is at least 1 zero between 2 sampling points Here we use a constant sampling but you can recode entirely the way to proceed Write a parallel version of the this solution Input In the first line of the input youll receive three integers the first L means the lower bound the cedont U mena sthe upper bound and the last S means how many samples will be tested The input must be read from the standard input Output The output contains a single line It has the total number of zeros The output must be written to the standard output 4 Examples Sample input 1 Sample output 1 10 1000 100 I found 649 Zeros 5 Problem D The Tree Center Tiago A O Alves The eccentricity of a node u in a Graph GV E is the maximum distance between u and another node v v V In a tree the center is the node with the minimum eccentricity You can see an example at Figure 10 2 4 6 7 3 5 9 8 1 Figure 1 A tree in which the center is the node 4 with maximum distance 3 This problem regards taking a tree as input and returning its center Your assignment is to develop a parallel solution for solving this problem Input The first line of input has an integer meaning the total number of nodes on the graph The following lines has the undirected edges between nodes The input must be read from the standard input Output The output contains the label of the center for a tree The output must be written to the standard output 6 Example Sample input 1 Sample output 1 10 4 10 4 10 2 10 6 4 7 6 3 6 5 7 9 7 8 8 1 7 Parallelization of a Radix Sort Algorithm Calebe de Paula Bianchini calebebianchinimackenziebr 1 The Problem Description Given a set of unsorted items with keys that can be considered as a binary representation of an integer the bits within the key can be used to sort the set of items This method of sorting is known as Radix Sort Write a program that includes a threaded version of a Radix Sort algorithm that sorts the keys read from an input file then output the sorted keys to another file The input and output file names shall be the first and second arguments on the command line of the application execution The first line of the input text file is the total number of keys N to be sorted this is followed by N keys one per line in the file A key will be a sevencharacter string made up of printable characters not including the space character ASCII 0x20 The number of keys within the file is less than 231 1 Sorted output must be stored in a text file one key per line Timing If you put timing code into your application to time the sorting process and report the elapsed time this time will be used for scoring If no timing code is added the entire execution time including time for input and output will be used for scoring 2 The Serial Solution The Radix Sort algorithms can be classified in two basic groups 1 2 least significant digit LSD and most significant digit MSD The LSD approach examines the digits in the keys in a righttoleft order working with the least significant digits The other approach MSD examines the digits in the key in a lefttoright order Figure 1 3 show an example of LSD and MSD radixsort using 3digits integer keys Figure 1 Summary of LSD and MSD approaches When a digit is chosen for MSD radixsort eg 2nd column of MSD picture an internal sorting is used to organize the keys 1 grouping them with the same digit The serial solution proposed uses a MDS radixsort algorithm to sort the keys and uses a fatpivot quicksort algorithm threeway partitioning 1 4 for internal sorting taking advantage of repeating digits 3 The First Parallel Solution Attempt Studying carefully the MSD approach it is possible to realize that when the first significant digit is ordered 1st step all the keys are partitioned by its first digit when the second significant digit is ordered 2nd step the keys are partitioned again Considering that each partition is a job and a job can be allocated for one processor without race condition it is possible to project a parallel algorithm for radixsort For threadcontrol the solution uses OpenMP 4 but still uses POSIX mutex and semaphores for IPC and critical section The idea is creating a job every time a partition is identified This job is put into a pool of jobs Each processor retrieves jobs from the pool like producerconsumer 5 and sorts the partkeys In this partsorting more jobs are created because other partitions are identified see Figure 1 When the pool of jobs is empty there are no jobs anymore and the entire keys are sorted stopping the algorithm It would be an interesting approach if there were not a critical section producerconsumer mutexsemaphore The critical section is a bottleneck for this solution Moreover the sorting algorithm is very fast for a small amount of keys 1000000 being faster than these producerconsumer approach Table 1 show a simple time comparison for the serial solution and a dualprocess parallel solution the externaltime means total time and the parenthesestime indicates the first most significant digit sort time Table 1 Simple time comparison in seconds Number of keys Serial Solution 1st Parallel Solution 2 proc 1000 0000487 0010744 0000369 100000 0063586 0525342 0025772 500000 0389194 73858603 0120917 1000000 0832710 879922349 0218579 50000000 47611455 ω 11061750 100000000 104441001 ω 23215623 4 The Final Parallel Solution The final parallel solution uses the same idea from the last parallel solution Beyond the normal threads an additional thread is created only for synchronization it will remain sleeping until all the other threads reach an end similar to readerwriter problem 5 This thread synchronization guarantees the values to the output file Analyzing the input ASCII from 0x21 to 0x7E 6 there are only 94 printable characters In other words only 94 partition after sorting the 1st most significant digit Considering a balanced distribution each of these partitions will contain 22845570 keys and based on the jobs problem from Table 1 it is not a good idea to have more jobs So after the creation of these 94 jobs each thread sorts the keys on the partition by itself If these keys are unbalanced there is a threshold that indicates to create a job is a partition has more than 22845570 keys Besides the first job for the first most significant digit begins to be the slowest part of the execution for a number of keys above 100000000 The iterative quicksort has to be improved and checkpoints are created These checkpoints indicate that the quicksort already have sorted part of the total keys After that the partitions until the checkpoint are verified and all the jobs created Thats mean from each checkpoint new jobs are put on the pool for the parallelthread sorting This improvement for quicksort considers the use of virtual memory in the hardwaresoftware specification too All the jobs created sorts first the firsts elements of the array avoiding too much swap from the disk to the memory 2311 keys with 71 character length will consume at least 16Gbytes of RAM Table 2 show the values collected from an 8processor machine for a time comparison between the algorithms Table 2 Comparison between algorithms in seconds Number of keys Serial Solution Parallel Solution 2proc 4proc 8proc 1000 0000487 0008024 0008484 0105758 100000 0063586 0041309 0031875 0104594 500000 0389194 0225825 0135147 0128708 1000000 0832710 0502728 0289878 0236469 50000000 47611455 27055081 15870333 11025388 100000000 104441001 59439492 33920247 23915724 500000000 511607666 287287585 166408876 122619186 1000000000 1174523109 595044868 355776813 263528890 5 Conclusion Sorting is the most common task in computer Because ordered data are easy to manipulate many software require sorted data But the amount of the data sorted is fundamental to construct an efficient parallel sort algorithm 7 The data have to be organized on memory and the processes have to manipulate it in a coordinate manner The results from last section show that a parallel algorithm is easy to project but it is not so easy to get a linear speedup see Table 2 In general considering a good sort algorithms a small amount of data sorted in a serial execution is faster than the same approach with a parallel algorithm Thus a good approach is to consider the amount of data to decide the way to sort serial or parallel 6 References 1 Sedgewick R Algorithms in Java Parts 14 3th edition AddisonWesley 2002 2 Wikipedia Radix Sort URL httpenwikipediaorgwikiRadixsort Accessed in 042009 3 Figure 1 RadixSort URL httpwwwstrchrcommediaradixsortpng Accesses in 042009 4 Wikipedia Quicksort URL httpptwikipediaorgwikiQuicksort Accessed in 042008 4 OpenMP OpenMPorg URL httpopenmporgwp Accessed in 042009 5 Tanembaum A Modern Operation Systems PrenticeHall 2007 6 Wikipedia ASCII URL httpenwikipediaorgwikiASCII Accessed in 042009 7 Grama A Gupta A Karypis G Kumar V Introduction to Parallel Computing AddisonWesley 2003 Entrega Final A partir do tema escolhido para o Projeto desenvolva uma solução paralela Apresente a ideia em um texto semelhante ao writeup comentando as vantagens e desvantagens de sua solução Mostre também os tempos de execução da versão sequencial e da versão paralela em uma tabela Mostre as curvas de speedup usando escalabilidade forte como objetivo discutindo brevemente essas curvas Todas essas informações devem estar em seu writeup Portanto seu relatório deve ter 10 ponto tabela de tempo e gráfico de speedup local 10 ponto tabela e gráfico de eficiência local 10 ponto texto comparativo entre essas tabelas e considerações sobre escalabilidade forte 25 pontos discussão sobre o projeto do algoritmo paralelo e a soluçãoconstrução feita utilize o processo de identificação de grãos e mostreos 25 ponto explicação de algum detalhe do códigofonte relativo à solução do algoritmo em especial sobre a parte paralelizada 10 ponto listagem da bibliografia utilizada e corretamente referenciada nas explicações 10 ponto proporcional ao speedup alcançado o limite mínimo do speedup é 2 19th Marathon of Parallel Programming SSCAD 2024 Calebe P Bianchini1 and Helio C Guardia2 1Mackenzie Presbyterian University 2Universidade Federal de Sao Carlos Rules for Remote Contest For all problems read carefully the input and output session For all problems a sequen tial implementation is given and it is against the output of those implementations that the output of your programs will be compared to decide if your implementation is correct You can modify the program in any way you see fit except when the problem descrip tion states otherwise You must upload a compressed file zip with your source code the Makefile and an execution script The script must have the name of the problem You can submit as many solutions to a problem as you want Only the last submission will be considered The Makefile must have the rule all which will be used to compile your source code The execution script runs your solution the way you design it it will be inspected not to corrupt the target machine The execution time of your program will be measured running it with time program and taking the real CPU time given Each program will be executed at least three times with the same input and the mean time will be taken into account The sequential program given will be measured the same way You will earn points in each problem correspond ing to the division of the sequential time by the time of your program speedup The team with the most points at the end of the marathon will be declared the winner This problem set contains 4 problems pages are numbered from 01 to 07 General Information Compilation You should use CC or CXX inside your Makefile Be careful when redefining them There is a simple Makefile inside you problem package that you can modify Example FLAGSO3 EXECsum CXXg all EXEC EXEC CXX FLAGS EXECcpp c o EXECo CXX FLAGS EXECo o EXEC Each judge machine has its group of compilers See them below and choose well when writing your Makefile The compiler that is tagged as default is predefined in CC and CXX variables machine compiler command host GCC 940 default C gcc C g MPI Open MPI 411 default C mpicc C mpic GCC 940 C gcc C g gpu NVidia CUDA 1201 default C nvcc C nvcc GCC 831 C gcc C g Submitting General information You must have an execution script that has the same name of the problem This script runs your solution the way you design it There is a simple script inside you problem package that should be modified Example binbash This script runs a generic Problem A Using 32 threads and OpenMP export OMPNUMTHREADS32 OMPNUMTHREADS32 sum Submitting MPI If you are planning to submit an MPI solution you should compile using mpiccmpic The script must call mpirunmpiexec with the correct number of processes max 160 binbash This script runs a generic Problem A Using MPI in the entire cluster 4 nodes mpirun np 4 sum Comparing times results In your personal machine measure the execution time of your solution using time pro gram Add inputoutput redirection when collecting time Use diff program to compare the original and your solution results Example time p A originalinputtxt myoutputtxt real 494 user 008 sys 156 diff myoutputtxt originaloutputtxt Do not measure time and do not add inputoutput redirection when submitting your so lution the autojudge system is prepared to collect your time and compare the results Problem A Optimizing Media Storage on Blu Rays Leonardo M Takuno Anyone who has ever recorded a variety of media onto CDs DVDs or Blu Rays knows that optimizing the use of space on each disk was never an easy task Sometimes it was necessary to adjust and reorganize movies photos and music to fit everything properly onto the disks Isabela needs to record a mix of movies photos and music onto her Blu Rays She has a collection of digital files on her computer and would like to distribute them across several Blu Rays She knows that each Blu Ray has a maximum storage capacity and is aware of the size of each type of file However she is having trouble deciding which files to put on which Blu Ray to maximize the use of each disk Write a parallel version of the this solution Input The first line of input consists of two positive integers N and K which represent the total number of files movies photos and music on Isabelas computer and the number of Blu Rays she has The second line of input consists of N positive integers which represent the size in gigabytes of each file The last line of input consists of K positive integers which represent the maximum storage capacity in gigabytes of each Blu Ray No file is larger than 50 gigabytes and no Blu Ray has a capacity greater than 50 gigabytes The input must be read from the standard input Output Print a single line containing the maximum total number of gigabytes of files that can be recorded on the Blu Rays The output must be written to the standard output Example Sample Input 1 Sample output 1 8 3 18 37 46 37 47 1 10 46 17 1 7 17 1 Problem B Traveling Salesman Problem Solver Lucas S Rosa Alfredo Goldman The Traveling Salesman Problem TSP is a classic optimization challenge in computer science and operations research It asks the question Given a list of cities and the distances between each pair of cities what is the shortest possible route that visits each city exactly once and returns to the origin city This problem is of great importance in various realworld applications including 1 Logistics and transportation planning 2 Circuit board drilling in manufacturing 3 DNA sequencing in bioinformatics 4 Computer wiring 5 Delivery route optimization The TSP is known to be NPhard meaning that as the number of cities increases the time required to find the optimal solution grows exponentially This makes it an excellent candidate for exploring heuristic algorithms This solution implements a TSP solver using the Shotgun Hill Climbing algorithm Heres a highlevel overview of its functionality 1 It reads a distance matrix from a CSV file representing the distances between cities 2 It uses a shotgun approach which involves multiple restarts of a hill climbing search 3 For each restart It generates a random initial tour It then repeatedly attempts to improve this tour using 2opt swaps If no improvement is found it moves to the next restart 4 After all restarts it returns the best tour found across all attempts The algorithm balances between exploration through multiple random restarts and ex ploitation through hill climbing aiming to find a good approximate solution to the TSP in a reasonable amount of time Write a parallel version of the this solution Input The first line of the input contains 1 the umber of iterations for each hill climbing attempt 2 the number of restarts shotgun attempts 3 a seed for the random number generator The following lines contains the distance matrix describing the distances of the cities The input must be read from the standard input 2 Output The program outputs 1 The best tour found represented as a sequence of city indices 2 The total length of this tour In this output the sequence of numbers represents the order in which cities are visited in the best tour found and the tour length represents the total distance traveled in this tour The output must be written to the standard output Example Sample input 1 Sample output 1 2000 20 17 Best tour found 0 2 1 010203 Tour length 03 000100 020001 3 Problem C Zeros of Riemann Zeta Function Luiz A Steffenel Zeros of the Riemann zeta function denoted as ζs are the values of the complex vari able s for which ζs 0 There are two types of zeros trivial and nontrivial Trivial zeros occur at negative even integers s 2 4 6 These are considered trivial because their existence is relatively easy to prove using the functional equation of the zeta function Nontrivial zeros are the complex values of s for which ζs 0 and have real part be tween 0 and 1 They are called nontrivial because their distribution is less understood and their study is important for understanding prime numbers and related objects in num ber theory Riemann Hypothesis is one of the most famous and longstanding unsolved problems in mathematics proposed by Bernhard Riemann in 1859 It conjectures that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 12 The code related to this challenge computes the number of zeros on the critical line of the Zeta function The objective is not to compute the zeros we count them to check that they are on the Riemann Line The exercise is to sample a region on the critical line to count how many times the function changes sign so that there is at least 1 zero between 2 sampling points Here we use a constant sampling but you can recode entirely the way to proceed Write a parallel version of the this solution Input In the first line of the input youll receive three integers the first L means the lower bound the cedont U mena sthe upper bound and the last S means how many samples will be tested The input must be read from the standard input Output The output contains a single line It has the total number of zeros The output must be written to the standard output 4 Examples Sample input 1 Sample output 1 10 1000 100 I found 649 Zeros 5 Problem D The Tree Center Tiago A O Alves The eccentricity of a node u in a Graph GV E is the maximum distance between u and another node v v V In a tree the center is the node with the minimum eccentricity You can see an example at Figure 10 2 4 6 7 3 5 9 8 1 Figure 1 A tree in which the center is the node 4 with maximum distance 3 This problem regards taking a tree as input and returning its center Your assignment is to develop a parallel solution for solving this problem Input The first line of input has an integer meaning the total number of nodes on the graph The following lines has the undirected edges between nodes The input must be read from the standard input Output The output contains the label of the center for a tree The output must be written to the standard output 6 Example Sample input 1 Sample output 1 10 4 10 4 10 2 10 6 4 7 6 3 6 5 7 9 7 8 8 1 7

Envie sua pergunta para a IA e receba a resposta na hora

Recomendado para você

Mini Projeto de Computação Paralela

10

Mini Projeto de Computação Paralela

Arquitetura de Computadores

MACKENZIE

Trabalho Assembly

9

Trabalho Assembly

Arquitetura de Computadores

UFPB

Arquitetura de Computadores

11

Arquitetura de Computadores

Arquitetura de Computadores

UFPI

Estudo do Padrão IEEE 754 para Aritmética de Ponto Flutuante em MIPS

28

Estudo do Padrão IEEE 754 para Aritmética de Ponto Flutuante em MIPS

Arquitetura de Computadores

UECE

Simulador MARS - Instrucoes para Calculo de RAID 0 e 1

1

Simulador MARS - Instrucoes para Calculo de RAID 0 e 1

Arquitetura de Computadores

FACAPE

Entrada e Saída com Win32 e MASM32

12

Entrada e Saída com Win32 e MASM32

Arquitetura de Computadores

UNIPE

Guia Iniciante Arduino - Multilógica Shop

150

Guia Iniciante Arduino - Multilógica Shop

Arquitetura de Computadores

UFG

Arquitetura do Conjunto de Instruções Assembly

22

Arquitetura do Conjunto de Instruções Assembly

Arquitetura de Computadores

UNIPE

Trabalho de Implementação com Avx e Avx2

2

Trabalho de Implementação com Avx e Avx2

Arquitetura de Computadores

UFLA

Classes dos Computadores Mordernos

7

Classes dos Computadores Mordernos

Arquitetura de Computadores

UNIP

Texto de pré-visualização

19th Marathon of Parallel Programming SSCAD 2024 Calebe P Bianchini1 and Helio C Guardia2 1Mackenzie Presbyterian University 2Universidade Federal de Sao Carlos Rules for Remote Contest For all problems read carefully the input and output session For all problems a sequen tial implementation is given and it is against the output of those implementations that the output of your programs will be compared to decide if your implementation is correct You can modify the program in any way you see fit except when the problem descrip tion states otherwise You must upload a compressed file zip with your source code the Makefile and an execution script The script must have the name of the problem You can submit as many solutions to a problem as you want Only the last submission will be considered The Makefile must have the rule all which will be used to compile your source code The execution script runs your solution the way you design it it will be inspected not to corrupt the target machine The execution time of your program will be measured running it with time program and taking the real CPU time given Each program will be executed at least three times with the same input and the mean time will be taken into account The sequential program given will be measured the same way You will earn points in each problem correspond ing to the division of the sequential time by the time of your program speedup The team with the most points at the end of the marathon will be declared the winner This problem set contains 4 problems pages are numbered from 01 to 07 General Information Compilation You should use CC or CXX inside your Makefile Be careful when redefining them There is a simple Makefile inside you problem package that you can modify Example FLAGSO3 EXECsum CXXg all EXEC EXEC CXX FLAGS EXECcpp c o EXECo CXX FLAGS EXECo o EXEC Each judge machine has its group of compilers See them below and choose well when writing your Makefile The compiler that is tagged as default is predefined in CC and CXX variables machine compiler command host GCC 940 default C gcc C g MPI Open MPI 411 default C mpicc C mpic GCC 940 C gcc C g gpu NVidia CUDA 1201 default C nvcc C nvcc GCC 831 C gcc C g Submitting General information You must have an execution script that has the same name of the problem This script runs your solution the way you design it There is a simple script inside you problem package that should be modified Example binbash This script runs a generic Problem A Using 32 threads and OpenMP export OMPNUMTHREADS32 OMPNUMTHREADS32 sum Submitting MPI If you are planning to submit an MPI solution you should compile using mpiccmpic The script must call mpirunmpiexec with the correct number of processes max 160 binbash This script runs a generic Problem A Using MPI in the entire cluster 4 nodes mpirun np 4 sum Comparing times results In your personal machine measure the execution time of your solution using time pro gram Add inputoutput redirection when collecting time Use diff program to compare the original and your solution results Example time p A originalinputtxt myoutputtxt real 494 user 008 sys 156 diff myoutputtxt originaloutputtxt Do not measure time and do not add inputoutput redirection when submitting your so lution the autojudge system is prepared to collect your time and compare the results Problem A Optimizing Media Storage on Blu Rays Leonardo M Takuno Anyone who has ever recorded a variety of media onto CDs DVDs or Blu Rays knows that optimizing the use of space on each disk was never an easy task Sometimes it was necessary to adjust and reorganize movies photos and music to fit everything properly onto the disks Isabela needs to record a mix of movies photos and music onto her Blu Rays She has a collection of digital files on her computer and would like to distribute them across several Blu Rays She knows that each Blu Ray has a maximum storage capacity and is aware of the size of each type of file However she is having trouble deciding which files to put on which Blu Ray to maximize the use of each disk Write a parallel version of the this solution Input The first line of input consists of two positive integers N and K which represent the total number of files movies photos and music on Isabelas computer and the number of Blu Rays she has The second line of input consists of N positive integers which represent the size in gigabytes of each file The last line of input consists of K positive integers which represent the maximum storage capacity in gigabytes of each Blu Ray No file is larger than 50 gigabytes and no Blu Ray has a capacity greater than 50 gigabytes The input must be read from the standard input Output Print a single line containing the maximum total number of gigabytes of files that can be recorded on the Blu Rays The output must be written to the standard output Example Sample Input 1 Sample output 1 8 3 18 37 46 37 47 1 10 46 17 1 7 17 1 Problem B Traveling Salesman Problem Solver Lucas S Rosa Alfredo Goldman The Traveling Salesman Problem TSP is a classic optimization challenge in computer science and operations research It asks the question Given a list of cities and the distances between each pair of cities what is the shortest possible route that visits each city exactly once and returns to the origin city This problem is of great importance in various realworld applications including 1 Logistics and transportation planning 2 Circuit board drilling in manufacturing 3 DNA sequencing in bioinformatics 4 Computer wiring 5 Delivery route optimization The TSP is known to be NPhard meaning that as the number of cities increases the time required to find the optimal solution grows exponentially This makes it an excellent candidate for exploring heuristic algorithms This solution implements a TSP solver using the Shotgun Hill Climbing algorithm Heres a highlevel overview of its functionality 1 It reads a distance matrix from a CSV file representing the distances between cities 2 It uses a shotgun approach which involves multiple restarts of a hill climbing search 3 For each restart It generates a random initial tour It then repeatedly attempts to improve this tour using 2opt swaps If no improvement is found it moves to the next restart 4 After all restarts it returns the best tour found across all attempts The algorithm balances between exploration through multiple random restarts and ex ploitation through hill climbing aiming to find a good approximate solution to the TSP in a reasonable amount of time Write a parallel version of the this solution Input The first line of the input contains 1 the umber of iterations for each hill climbing attempt 2 the number of restarts shotgun attempts 3 a seed for the random number generator The following lines contains the distance matrix describing the distances of the cities The input must be read from the standard input 2 Output The program outputs 1 The best tour found represented as a sequence of city indices 2 The total length of this tour In this output the sequence of numbers represents the order in which cities are visited in the best tour found and the tour length represents the total distance traveled in this tour The output must be written to the standard output Example Sample input 1 Sample output 1 2000 20 17 Best tour found 0 2 1 010203 Tour length 03 000100 020001 3 Problem C Zeros of Riemann Zeta Function Luiz A Steffenel Zeros of the Riemann zeta function denoted as ζs are the values of the complex vari able s for which ζs 0 There are two types of zeros trivial and nontrivial Trivial zeros occur at negative even integers s 2 4 6 These are considered trivial because their existence is relatively easy to prove using the functional equation of the zeta function Nontrivial zeros are the complex values of s for which ζs 0 and have real part be tween 0 and 1 They are called nontrivial because their distribution is less understood and their study is important for understanding prime numbers and related objects in num ber theory Riemann Hypothesis is one of the most famous and longstanding unsolved problems in mathematics proposed by Bernhard Riemann in 1859 It conjectures that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 12 The code related to this challenge computes the number of zeros on the critical line of the Zeta function The objective is not to compute the zeros we count them to check that they are on the Riemann Line The exercise is to sample a region on the critical line to count how many times the function changes sign so that there is at least 1 zero between 2 sampling points Here we use a constant sampling but you can recode entirely the way to proceed Write a parallel version of the this solution Input In the first line of the input youll receive three integers the first L means the lower bound the cedont U mena sthe upper bound and the last S means how many samples will be tested The input must be read from the standard input Output The output contains a single line It has the total number of zeros The output must be written to the standard output 4 Examples Sample input 1 Sample output 1 10 1000 100 I found 649 Zeros 5 Problem D The Tree Center Tiago A O Alves The eccentricity of a node u in a Graph GV E is the maximum distance between u and another node v v V In a tree the center is the node with the minimum eccentricity You can see an example at Figure 10 2 4 6 7 3 5 9 8 1 Figure 1 A tree in which the center is the node 4 with maximum distance 3 This problem regards taking a tree as input and returning its center Your assignment is to develop a parallel solution for solving this problem Input The first line of input has an integer meaning the total number of nodes on the graph The following lines has the undirected edges between nodes The input must be read from the standard input Output The output contains the label of the center for a tree The output must be written to the standard output 6 Example Sample input 1 Sample output 1 10 4 10 4 10 2 10 6 4 7 6 3 6 5 7 9 7 8 8 1 7 Parallelization of a Radix Sort Algorithm Calebe de Paula Bianchini calebebianchinimackenziebr 1 The Problem Description Given a set of unsorted items with keys that can be considered as a binary representation of an integer the bits within the key can be used to sort the set of items This method of sorting is known as Radix Sort Write a program that includes a threaded version of a Radix Sort algorithm that sorts the keys read from an input file then output the sorted keys to another file The input and output file names shall be the first and second arguments on the command line of the application execution The first line of the input text file is the total number of keys N to be sorted this is followed by N keys one per line in the file A key will be a sevencharacter string made up of printable characters not including the space character ASCII 0x20 The number of keys within the file is less than 231 1 Sorted output must be stored in a text file one key per line Timing If you put timing code into your application to time the sorting process and report the elapsed time this time will be used for scoring If no timing code is added the entire execution time including time for input and output will be used for scoring 2 The Serial Solution The Radix Sort algorithms can be classified in two basic groups 1 2 least significant digit LSD and most significant digit MSD The LSD approach examines the digits in the keys in a righttoleft order working with the least significant digits The other approach MSD examines the digits in the key in a lefttoright order Figure 1 3 show an example of LSD and MSD radixsort using 3digits integer keys Figure 1 Summary of LSD and MSD approaches When a digit is chosen for MSD radixsort eg 2nd column of MSD picture an internal sorting is used to organize the keys 1 grouping them with the same digit The serial solution proposed uses a MDS radixsort algorithm to sort the keys and uses a fatpivot quicksort algorithm threeway partitioning 1 4 for internal sorting taking advantage of repeating digits 3 The First Parallel Solution Attempt Studying carefully the MSD approach it is possible to realize that when the first significant digit is ordered 1st step all the keys are partitioned by its first digit when the second significant digit is ordered 2nd step the keys are partitioned again Considering that each partition is a job and a job can be allocated for one processor without race condition it is possible to project a parallel algorithm for radixsort For threadcontrol the solution uses OpenMP 4 but still uses POSIX mutex and semaphores for IPC and critical section The idea is creating a job every time a partition is identified This job is put into a pool of jobs Each processor retrieves jobs from the pool like producerconsumer 5 and sorts the partkeys In this partsorting more jobs are created because other partitions are identified see Figure 1 When the pool of jobs is empty there are no jobs anymore and the entire keys are sorted stopping the algorithm It would be an interesting approach if there were not a critical section producerconsumer mutexsemaphore The critical section is a bottleneck for this solution Moreover the sorting algorithm is very fast for a small amount of keys 1000000 being faster than these producerconsumer approach Table 1 show a simple time comparison for the serial solution and a dualprocess parallel solution the externaltime means total time and the parenthesestime indicates the first most significant digit sort time Table 1 Simple time comparison in seconds Number of keys Serial Solution 1st Parallel Solution 2 proc 1000 0000487 0010744 0000369 100000 0063586 0525342 0025772 500000 0389194 73858603 0120917 1000000 0832710 879922349 0218579 50000000 47611455 ω 11061750 100000000 104441001 ω 23215623 4 The Final Parallel Solution The final parallel solution uses the same idea from the last parallel solution Beyond the normal threads an additional thread is created only for synchronization it will remain sleeping until all the other threads reach an end similar to readerwriter problem 5 This thread synchronization guarantees the values to the output file Analyzing the input ASCII from 0x21 to 0x7E 6 there are only 94 printable characters In other words only 94 partition after sorting the 1st most significant digit Considering a balanced distribution each of these partitions will contain 22845570 keys and based on the jobs problem from Table 1 it is not a good idea to have more jobs So after the creation of these 94 jobs each thread sorts the keys on the partition by itself If these keys are unbalanced there is a threshold that indicates to create a job is a partition has more than 22845570 keys Besides the first job for the first most significant digit begins to be the slowest part of the execution for a number of keys above 100000000 The iterative quicksort has to be improved and checkpoints are created These checkpoints indicate that the quicksort already have sorted part of the total keys After that the partitions until the checkpoint are verified and all the jobs created Thats mean from each checkpoint new jobs are put on the pool for the parallelthread sorting This improvement for quicksort considers the use of virtual memory in the hardwaresoftware specification too All the jobs created sorts first the firsts elements of the array avoiding too much swap from the disk to the memory 2311 keys with 71 character length will consume at least 16Gbytes of RAM Table 2 show the values collected from an 8processor machine for a time comparison between the algorithms Table 2 Comparison between algorithms in seconds Number of keys Serial Solution Parallel Solution 2proc 4proc 8proc 1000 0000487 0008024 0008484 0105758 100000 0063586 0041309 0031875 0104594 500000 0389194 0225825 0135147 0128708 1000000 0832710 0502728 0289878 0236469 50000000 47611455 27055081 15870333 11025388 100000000 104441001 59439492 33920247 23915724 500000000 511607666 287287585 166408876 122619186 1000000000 1174523109 595044868 355776813 263528890 5 Conclusion Sorting is the most common task in computer Because ordered data are easy to manipulate many software require sorted data But the amount of the data sorted is fundamental to construct an efficient parallel sort algorithm 7 The data have to be organized on memory and the processes have to manipulate it in a coordinate manner The results from last section show that a parallel algorithm is easy to project but it is not so easy to get a linear speedup see Table 2 In general considering a good sort algorithms a small amount of data sorted in a serial execution is faster than the same approach with a parallel algorithm Thus a good approach is to consider the amount of data to decide the way to sort serial or parallel 6 References 1 Sedgewick R Algorithms in Java Parts 14 3th edition AddisonWesley 2002 2 Wikipedia Radix Sort URL httpenwikipediaorgwikiRadixsort Accessed in 042009 3 Figure 1 RadixSort URL httpwwwstrchrcommediaradixsortpng Accesses in 042009 4 Wikipedia Quicksort URL httpptwikipediaorgwikiQuicksort Accessed in 042008 4 OpenMP OpenMPorg URL httpopenmporgwp Accessed in 042009 5 Tanembaum A Modern Operation Systems PrenticeHall 2007 6 Wikipedia ASCII URL httpenwikipediaorgwikiASCII Accessed in 042009 7 Grama A Gupta A Karypis G Kumar V Introduction to Parallel Computing AddisonWesley 2003 Entrega Final A partir do tema escolhido para o Projeto desenvolva uma solução paralela Apresente a ideia em um texto semelhante ao writeup comentando as vantagens e desvantagens de sua solução Mostre também os tempos de execução da versão sequencial e da versão paralela em uma tabela Mostre as curvas de speedup usando escalabilidade forte como objetivo discutindo brevemente essas curvas Todas essas informações devem estar em seu writeup Portanto seu relatório deve ter 10 ponto tabela de tempo e gráfico de speedup local 10 ponto tabela e gráfico de eficiência local 10 ponto texto comparativo entre essas tabelas e considerações sobre escalabilidade forte 25 pontos discussão sobre o projeto do algoritmo paralelo e a soluçãoconstrução feita utilize o processo de identificação de grãos e mostreos 25 ponto explicação de algum detalhe do códigofonte relativo à solução do algoritmo em especial sobre a parte paralelizada 10 ponto listagem da bibliografia utilizada e corretamente referenciada nas explicações 10 ponto proporcional ao speedup alcançado o limite mínimo do speedup é 2 19th Marathon of Parallel Programming SSCAD 2024 Calebe P Bianchini1 and Helio C Guardia2 1Mackenzie Presbyterian University 2Universidade Federal de Sao Carlos Rules for Remote Contest For all problems read carefully the input and output session For all problems a sequen tial implementation is given and it is against the output of those implementations that the output of your programs will be compared to decide if your implementation is correct You can modify the program in any way you see fit except when the problem descrip tion states otherwise You must upload a compressed file zip with your source code the Makefile and an execution script The script must have the name of the problem You can submit as many solutions to a problem as you want Only the last submission will be considered The Makefile must have the rule all which will be used to compile your source code The execution script runs your solution the way you design it it will be inspected not to corrupt the target machine The execution time of your program will be measured running it with time program and taking the real CPU time given Each program will be executed at least three times with the same input and the mean time will be taken into account The sequential program given will be measured the same way You will earn points in each problem correspond ing to the division of the sequential time by the time of your program speedup The team with the most points at the end of the marathon will be declared the winner This problem set contains 4 problems pages are numbered from 01 to 07 General Information Compilation You should use CC or CXX inside your Makefile Be careful when redefining them There is a simple Makefile inside you problem package that you can modify Example FLAGSO3 EXECsum CXXg all EXEC EXEC CXX FLAGS EXECcpp c o EXECo CXX FLAGS EXECo o EXEC Each judge machine has its group of compilers See them below and choose well when writing your Makefile The compiler that is tagged as default is predefined in CC and CXX variables machine compiler command host GCC 940 default C gcc C g MPI Open MPI 411 default C mpicc C mpic GCC 940 C gcc C g gpu NVidia CUDA 1201 default C nvcc C nvcc GCC 831 C gcc C g Submitting General information You must have an execution script that has the same name of the problem This script runs your solution the way you design it There is a simple script inside you problem package that should be modified Example binbash This script runs a generic Problem A Using 32 threads and OpenMP export OMPNUMTHREADS32 OMPNUMTHREADS32 sum Submitting MPI If you are planning to submit an MPI solution you should compile using mpiccmpic The script must call mpirunmpiexec with the correct number of processes max 160 binbash This script runs a generic Problem A Using MPI in the entire cluster 4 nodes mpirun np 4 sum Comparing times results In your personal machine measure the execution time of your solution using time pro gram Add inputoutput redirection when collecting time Use diff program to compare the original and your solution results Example time p A originalinputtxt myoutputtxt real 494 user 008 sys 156 diff myoutputtxt originaloutputtxt Do not measure time and do not add inputoutput redirection when submitting your so lution the autojudge system is prepared to collect your time and compare the results Problem A Optimizing Media Storage on Blu Rays Leonardo M Takuno Anyone who has ever recorded a variety of media onto CDs DVDs or Blu Rays knows that optimizing the use of space on each disk was never an easy task Sometimes it was necessary to adjust and reorganize movies photos and music to fit everything properly onto the disks Isabela needs to record a mix of movies photos and music onto her Blu Rays She has a collection of digital files on her computer and would like to distribute them across several Blu Rays She knows that each Blu Ray has a maximum storage capacity and is aware of the size of each type of file However she is having trouble deciding which files to put on which Blu Ray to maximize the use of each disk Write a parallel version of the this solution Input The first line of input consists of two positive integers N and K which represent the total number of files movies photos and music on Isabelas computer and the number of Blu Rays she has The second line of input consists of N positive integers which represent the size in gigabytes of each file The last line of input consists of K positive integers which represent the maximum storage capacity in gigabytes of each Blu Ray No file is larger than 50 gigabytes and no Blu Ray has a capacity greater than 50 gigabytes The input must be read from the standard input Output Print a single line containing the maximum total number of gigabytes of files that can be recorded on the Blu Rays The output must be written to the standard output Example Sample Input 1 Sample output 1 8 3 18 37 46 37 47 1 10 46 17 1 7 17 1 Problem B Traveling Salesman Problem Solver Lucas S Rosa Alfredo Goldman The Traveling Salesman Problem TSP is a classic optimization challenge in computer science and operations research It asks the question Given a list of cities and the distances between each pair of cities what is the shortest possible route that visits each city exactly once and returns to the origin city This problem is of great importance in various realworld applications including 1 Logistics and transportation planning 2 Circuit board drilling in manufacturing 3 DNA sequencing in bioinformatics 4 Computer wiring 5 Delivery route optimization The TSP is known to be NPhard meaning that as the number of cities increases the time required to find the optimal solution grows exponentially This makes it an excellent candidate for exploring heuristic algorithms This solution implements a TSP solver using the Shotgun Hill Climbing algorithm Heres a highlevel overview of its functionality 1 It reads a distance matrix from a CSV file representing the distances between cities 2 It uses a shotgun approach which involves multiple restarts of a hill climbing search 3 For each restart It generates a random initial tour It then repeatedly attempts to improve this tour using 2opt swaps If no improvement is found it moves to the next restart 4 After all restarts it returns the best tour found across all attempts The algorithm balances between exploration through multiple random restarts and ex ploitation through hill climbing aiming to find a good approximate solution to the TSP in a reasonable amount of time Write a parallel version of the this solution Input The first line of the input contains 1 the umber of iterations for each hill climbing attempt 2 the number of restarts shotgun attempts 3 a seed for the random number generator The following lines contains the distance matrix describing the distances of the cities The input must be read from the standard input 2 Output The program outputs 1 The best tour found represented as a sequence of city indices 2 The total length of this tour In this output the sequence of numbers represents the order in which cities are visited in the best tour found and the tour length represents the total distance traveled in this tour The output must be written to the standard output Example Sample input 1 Sample output 1 2000 20 17 Best tour found 0 2 1 010203 Tour length 03 000100 020001 3 Problem C Zeros of Riemann Zeta Function Luiz A Steffenel Zeros of the Riemann zeta function denoted as ζs are the values of the complex vari able s for which ζs 0 There are two types of zeros trivial and nontrivial Trivial zeros occur at negative even integers s 2 4 6 These are considered trivial because their existence is relatively easy to prove using the functional equation of the zeta function Nontrivial zeros are the complex values of s for which ζs 0 and have real part be tween 0 and 1 They are called nontrivial because their distribution is less understood and their study is important for understanding prime numbers and related objects in num ber theory Riemann Hypothesis is one of the most famous and longstanding unsolved problems in mathematics proposed by Bernhard Riemann in 1859 It conjectures that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 12 The code related to this challenge computes the number of zeros on the critical line of the Zeta function The objective is not to compute the zeros we count them to check that they are on the Riemann Line The exercise is to sample a region on the critical line to count how many times the function changes sign so that there is at least 1 zero between 2 sampling points Here we use a constant sampling but you can recode entirely the way to proceed Write a parallel version of the this solution Input In the first line of the input youll receive three integers the first L means the lower bound the cedont U mena sthe upper bound and the last S means how many samples will be tested The input must be read from the standard input Output The output contains a single line It has the total number of zeros The output must be written to the standard output 4 Examples Sample input 1 Sample output 1 10 1000 100 I found 649 Zeros 5 Problem D The Tree Center Tiago A O Alves The eccentricity of a node u in a Graph GV E is the maximum distance between u and another node v v V In a tree the center is the node with the minimum eccentricity You can see an example at Figure 10 2 4 6 7 3 5 9 8 1 Figure 1 A tree in which the center is the node 4 with maximum distance 3 This problem regards taking a tree as input and returning its center Your assignment is to develop a parallel solution for solving this problem Input The first line of input has an integer meaning the total number of nodes on the graph The following lines has the undirected edges between nodes The input must be read from the standard input Output The output contains the label of the center for a tree The output must be written to the standard output 6 Example Sample input 1 Sample output 1 10 4 10 4 10 2 10 6 4 7 6 3 6 5 7 9 7 8 8 1 7

Sua Nova Sala de Aula

Sua Nova Sala de Aula

Empresa

Central de ajuda Contato Blog

Legal

Termos de uso Política de privacidade Política de cookies Código de honra

Baixe o app

4,8
(35.000 avaliações)
© 2025 Meu Guru®