Openmp, CellSs, Codeplay

Discuss software development issues here.

Openmp, CellSs, Codeplay

Postby kengreen » 04 Oct 2009, 01:21

I am interested in these "pre-compilers" which promise to simplify coding for the SPU. All require annotation of C/C++ code to indicate the parts that can be offloaded to the SPU's CellSs is part of YDL's distribution and Openmp can be used with the gnu compiler (>4.2) that ships with YDL. Codeplay is about to release their free Offload Community Edition for the PS3 that includes a compiler, debugger and profiler. Fixstars used to have a comparison between CellSs and Openmp on their website, but I can't find it anymore. Anyone with experience on how these work, how easy they are to use, and how close they come to optimized SPU code?
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, Codeplay

Postby kengreen » 09 Oct 2009, 18:48

I got the following from "CellSs: Making it easier to program the Cell
Broadband Engine processor" by J. M. Perez, P. Bellens,
R. M. Badia, J. Labarta in IBM J. RES. & DEV. VOL. 51 NO. 5
SEPTEMBER 2007

"In OpenMP, the programmer explicitly indicates what is
parallel and what is not, while in CellSs, the programmer
identifies pieces of code that are independent of one
another... OpenMP is more appropriate for applications with
parallelism at the loop level, while CellSs fits better for
applications with parallelism at the function level.
However, given the current 3.0 extensions of OpenMP,
which define the concept of task, both programming
models can converge into one in the future. We plan to
perform the integration of OpenMP and CellSs and
develop other extensions."


I compiled and ran some of the examples in the CellSs folder
from /usr/share/doc/cellss/examples/matmul/ with these results:

matmul_novec 10 1518 MFlops
matmul_vec1 10 7072 MFlops
matmul_vec2 10 20,985 MFlops
matmul_vec3 10 35,905 MFlops

For comparison I modified the code for matmul_novec to run on
just the gcccompiler and modified it to use openmp.

matmul_gcc 10 531 MFlops
matmul_omp 10 152 MFlops

I am sure that others could improve the OpenMP results. I don't have
the IBM XLC-SSC compiler but gcc is supposed to work with OpenMP.
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, XLC-SSC

Postby kengreen » 22 Oct 2009, 11:15

Using the DES source code from Gnu Privacy Guard, Mnjul and others modified it to run under IBM's ALF and XLC-SSC compiler. They got the following optimal results using 24MB file to encrypt: IBM-ALF 80MB encryptions/sec, IBM XLC_SSC (0.9) 60MB encryptions/sec. I modified their SSC source code to run on openmp and Cellss with the following results encrypting the IBM redbook.pdf file 10,677,673B, gcc43 -fopenmp 40MB encryptions/sec Cellss 24MB encyptions/sec. I got 57.4MB encryptions/sec using IBM XLC-SSC similar to their results.
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, XLC-SSC

Postby ppietro » 22 Oct 2009, 17:34

kengreen wrote:Using the DES source code from Gnu Privacy Guard, Mnjul and others modified it to run under IBM's ALF and XLC-SSC compiler. They got the following optimal results using 24MB file to encrypt: IBM-ALF 80MB encryptions/sec, IBM XLC_SSC (0.9) 60MB encryptions/sec. I modified their SSC source code to run on openmp and Cellss with the following results encrypting the IBM redbook.pdf file 10,677,673B, gcc43 -fopenmp 40MB encryptions/sec Cellss 24MB encyptions/sec. I got 57.4MB encryptions/sec using IBM XLC-SSC similar to their results.


Interesting. I was reading through some of the documentation about this, and I have a question.

I see that you can use OpenMP with gcc - but do you know for sure that it's using the SPEs? You might just be using better threading on the PPE.

As for Fixstars comparison page - I can only find that in Google's cache. XLC SSC will find it.

Thanks for posting your results. :D

Cheers,
Paul
User avatar
ppietro
Site Admin
Site Admin
 
Posts: 4965
Joined: 13 Sep 2007, 22:18

Re: Openmp, CellSs, XLC-SSC

Postby kengreen » 23 Oct 2009, 14:00

I don't know. I get similar results when I compile the separate files des.c and main.c and the combineddes.c file using ppu-gcc43 -fopenmp .269822 sec and .276592 sec. When I compile the des.c file to des.o and link it to main.c with cellss-cc using CSS_MAX_SPUS=6 or compile the combineddes.c with cellss-cc, the results are also similar .424064sec and .435846 sec. Finally using XLC-SSC also gives similar results: .185912 sec and .190438 sec. This doesn't answer your question but seems to indicate that I am getting somewhere. IBM XLC-SSC doesn't want to compile the matmul examples from /usr/share/doc/cellss/examples/matmul so I can't comment on that. I don't know if and when Codeplay's Offload Community Edition will be available.
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, XLC-SSC

Postby ppietro » 23 Oct 2009, 19:09

kengreen wrote:I don't know. I get similar results when I compile the separate files des.c and main.c and the combineddes.c file using ppu-gcc43 -fopenmp .269822 sec and .276592 sec. When I compile the des.c file to des.o and link it to main.c with cellss-cc using CSS_MAX_SPUS=6 or compile the combineddes.c with cellss-cc, the results are also similar .424064sec and .435846 sec. Finally using XLC-SSC also gives similar results: .185912 sec and .190438 sec. This doesn't answer your question but seems to indicate that I am getting somewhere. IBM XLC-SSC doesn't want to compile the matmul examples from /usr/share/doc/cellss/examples/matmul so I can't comment on that. I don't know if and when Codeplay's Offload Community Edition will be available.


If you have access to a terminal window, you can use the spu-top command to see spu load. That might help determine where the code is going. :)

Cheers,
Paul
User avatar
ppietro
Site Admin
Site Admin
 
Posts: 4965
Joined: 13 Sep 2007, 22:18

Re: Openmp, CellSs, Codeplay

Postby kengreen » 24 Oct 2009, 14:28

Spu-Top indicated the both the Cellss-cc and XLC-SSC compilers generate code for all six SPU's. The gcc43 -fopenmp does not generate SPU code. I know that I need gcc compiler version > 4.2 to generate openMP code and I got ppu-gcc43 and spu-gcc43 (version 4.3.2) from Barcelona SuperComputing site. The code for XLC-SSC is the same as for openmp, containing #pragma omp parallel for private(i) statement. I am at a loss to explain why the CellSs-cc compiler generates such slow code for DES while it generates much faster code for matmul examples.
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, Codeplay

Postby kengreen » 03 Nov 2009, 17:35

To end this thread since no one has picked it up, I will add my observations: First I don't know when Codeplay's offload will be available for non SCE developers. No one has replied to my emails. It seems that Offload will take more effort then XLC-SSC or CellSS to modify a scalar C program to run on Cell processor SPU's. Second, both IBM's XLC-SSC and BSC's CellSs generate code that runs on the SPU's with minimal additions to scalar C programs, i.e. #pragma directives. CellSs is free and included in YDL, the current version of IBM"s XLC-SSC is not free but the previous version 0.9 is free and available with the Extra's of IBM SDK 3.0. CellSs only takes one ".c" source code although it will link in other object files ".o". XLC-SSC won't take SPU intrinsic functions that CellSs will take. Between the two, I like XLC-SSC better. It seems to produce faster code. BSC plans to merge CellSs with OpenMP 3.0 some time in the future. IBM has just release a Cell specific compiler for OpenCL which I will experiment with next.
kengreen
ydl lover
ydl lover
 
Posts: 78
Joined: 08 May 2008, 01:22
Location: arkansas

Re: Openmp, CellSs, Codeplay

Postby criley » 05 Nov 2009, 14:19

kengreen wrote:To end this thread since no one has picked it up, I will add my observations: First I don't know when Codeplay's offload will be available for non SCE developers. No one has replied to my emails. It seems that Offload will take more effort then XLC-SSC or CellSS to modify a scalar C program to run on Cell processor SPU's.


Hi kengreen,

I'm Colin, a lead developer on Codeplay Offload. Sorry we didnt reply to your email sooner, our marketing folk were away at the start of the week and only returned today.

We hope to release a beta version of the Cell Linux Offload system in two weeks. In the mean time, you can see a quick overview of the system here, more on call graph duplication which is what makes Codeplay Offload so powerful, and also the language extensions specification document here.

A few things that may not be on the site, but can tell you here, is:
- It's not just a pre-compiler, its a fully vectorizing compiler that has built in optimized vmx2spu for when VMX intrinsics are found in functions you want offloaded to SPU.
- SPU function overlay generation, via a simple attribute.
- It can handle function pointers, and virtual functions, inside offload blocks.
- Call Graph duplication allows for complex function calls to be duplicated and compiled for spu, using automated SWCache accesses when required.
- Ability to overload functions with the __offload modifier, which allows type-safe SPU intrinsics, dma and other SPU native features, all in a single-source fasion.

Please note that Codeplay Offload is not an auto-parallelizing compiler. It is vectorizing, and optimizes for SPU when needed. The only problem I can see is that you talk about C files, Offload is a C++ compiler; so you may get the usual C->C++ warnings/errors at first. I look forward to seeing what you can do with it when released :)

Colin.
criley
ydl newbie
ydl newbie
 
Posts: 1
Joined: 05 Nov 2009, 13:58


Return to Software Development

Who is online

Users browsing this forum: No registered users and 2 guests