Dalon Work

41:41

34 Doxygen

8 жыл бұрын

44:35

30 C++ Templates And Exceptions

8 жыл бұрын

1:05:03

32 Vim

8 жыл бұрын

54:25

33 Basic Git

8 жыл бұрын

31:07

31 C++ Exceptions

8 жыл бұрын

1:00:23

26 Nbody and Modern Fortran

8 жыл бұрын

45:40

27 Code Organization

8 жыл бұрын

29:58

29 Genetic Algorithms

8 жыл бұрын

39:16

22 Correct Memory Handling In C++

8 жыл бұрын

23:05

25 N Body Simulation

8 жыл бұрын

24:23

24 Correct Memory Management Part 2

8 жыл бұрын

6:57

23 Debugging MPI Programs

8 жыл бұрын

57:01

21 Debugging Your Code

8 жыл бұрын

29:08

20 Ghost Nodes and Efficient MPI

8 жыл бұрын

39:20

18 MPI Domain Decomposition

8 жыл бұрын

20:50

19 MPI Jacobi and Coding Style

8 жыл бұрын

55:20

17 Misc Items and more MPI

8 жыл бұрын

48:04

15 A Game Of Telephone

8 жыл бұрын

26:52

14 Distributed Computing and MPI

8 жыл бұрын

42:18

12 Parallel Gauss Seidel & Efficient OpenMP

8 жыл бұрын

32:23

11 Parallel Jacobi Computer Memory ccNUMA & C++

8 жыл бұрын

43:06

10 OpenMP ccNUMA MatPlotLib

8 жыл бұрын

40:19

9 Parallel Scalability

8 жыл бұрын

48:01

1 Linux

8 жыл бұрын

56:04

3 Programming In Linux

8 жыл бұрын

51:44

0 Linux

8 жыл бұрын

49:08

8 Introduction to Shared Memory Parallel Computing And OpenMP

8 жыл бұрын

42:36

6 Basic Serial Optimizations

8 жыл бұрын

38:58

4 Modern Processors

8 жыл бұрын

Пікірлер

@williamweller8100 7 жыл бұрын

Hi, I watched your video on running the Gauss Seidel method in parallel (sadly, comments are disabled on that video). I'm currently working on my senior project, which is all about GPU acceleration. I was given a problem to solve, write a multi-grid poisson solver in parallel. I've currently got a version working in C (translated from C++), but it's only single threaded, and so takes a while to solve large grids. After failing to write a parallel function for the Gauss Seidel method the first time, I decided to write something else to be passed off to the GPU. Funny thing is, the way the program is written, simply copying values from one 2D array to another takes longer in a GPU implementation (can only test grids of size 4096 * 4096). The GPU is, of course, about 25x faster than the CPU when doing this, but I'm not sure of how to directly pass 2D arrays to the GPU, and so have to flatten the arrays to 1 dimension before passing it off, then reconstruct the 2D array after the kernel returns. The whole processes of flattening and reconstructing takes significantly longer to do than just letting the CPU copy the values from one 2D array to the other. I've tracked the time each method takes, and the smoothing process takes by far the longest time, which uses the Gauss Seidel method. If, for example, a solution takes 11 seconds to solve, the smoothing process will take 10 of those seconds (depending, I can specify how many times I want the grids to be smoothed). I was wondering if you had an email I could get to bounce some ideas off of you? I'm not sure if you have any experience with CUDA, but either way.

@DalonWork 7 жыл бұрын

Unfortunately, I don't have any experience with GPUs. If your project is in C, then your 2D arrays are probably double pointers e.g. `double ** a`, yes? You can't pass them to the GPU in that format, because the data is not contiguous in memory. However, if you are using static arrays: double a[4096][4096], you have a chance. This is because these ARE contiguous in memory. If this is not the case, then I suggest that you use a 1D array, and index it appropriately, as if it was a 2D array. (see en.wikipedia.org/wiki/Row-_and_column-major_order). Then you can pass the array without the packing/unpacking.

Ең жақсы KZfaq

Пікірлер