Asg 1: Ray Tracer OMP

Objectives

To parallelize the ray tracer with OpenMP (see also: wikipedia) using as many cores as there are available on any given system

Assignment

Parallelize the C++ ray tracer from the previous assignment using omp compiler directives
1. add -fopenmp to the CFLAGS line in the Makefile
2. using integer variables tid and ncores in main() find out how many cores there at your disposal via this piece of code: #pragma omp parallel private(tid) { if((tid = omp_get_thread_num())==0) ncores = omp_get_num_threads(); }
3. set integer variable chunk to the number of rows (height of the image to be ray traced) divided by the number of cores—this stipulates how many rows of the image each core will process—we parallelize the ray tracer by chopping up the image into horizontal bands, and having each core process each band of pixels simultaneously
4. insert this compiler directive immediately above the nested for loops iterating over the image pixels (x,y): #pragma omp parallel for \ shared(...fill this in...) \ private(...fill this in...) \ schedule(static,chunk) one important aspect of this assignment is to properly figure out which variables are meant to be shared among parallel processes and which are meant to only be manipulated privately
5. make sure that the nested loops write to img, an array of (pointer to) type rgb_t<unsigned char>, allocated to size w*h and that the code inside the loops does not write to std::cout directly—provide a second doubly-nested loop to ouptut the pixels after the parallelized nested loops finish
The above is almost all there is to it...if the ray tracer had been designed with multiple (simultaneous) rays in mind, the above would be all that is needed. Unfortunately, the ray tracer suffers from a serial design decision, one that ever only expected a single ray at any given time. The problem shows up in the call obj->getlast_hit() This suggests that an object stores the surface point (and normal) of the last ray that hit it. When there was only one ray at a time, querying the object for this piece of info made sense. Now, however, this design strategy leads to a race condition: in between the time that some ray hit an object, another ray came along and hit it again. The first ray, when querying the object for the hit point, gets the second ray's location. This leads to rather nasty color banding problems. To fix this:
1. rewrite ray_t::trace() so that it no longer needs and object_t *last_hit argument
2. the old model_t::find_closest() routine won't work anymore because its initial design assumed that a ray would hit any given object only once (and reflect). This won't work when there are two (or more) rays hitting an object at the same time or when transparent objects have volume, through which the transparent ray needs to exit, once it has penetrated it—see Asg 2.
  This means, particularly for transparent spheres, that the ray will hit the sphere twice, and so the find_closest() routine needs to be rewritten:
  1. rename the local variable mindist to closest_dist and set it to INFINITY instead of to -1
  2. rename the variable obj to c_obj and rename the variable closest to closest_obj
  3. rename the local variable dist to c_dist and rename the lcoal variable mindist to closest_dist
  4. ensure that as arguments to find_closest you now have four vec_t pass-by-reference variables pos, dir, hit, and N and one double dist
  5. remove the obj==last_obj || evaluation from the initial call to obj->hits()—the current object should only be skipped if the returned dist < 0
  6. only record the closest distance, closest object, closest hit point, and closest normal if (0.00001 < c_dist) && (c_dist < closest_dist) (the distance to the intersection point must be non-zero) making sure that you called obj->hits with arguments pos, dir, c_hit and c_N (note that hits will have to be rewritten; see below)
  7. by "recording the closest distance" I mean that closest_dist, closest_obj, closest_hit, and closest_N all get set to c_dist, c_obj, c_hit, and c_N, respectively, if the above if statement is true
  8. note that model::find_closest() must now overwrite the incoming arguments vec_t hit and vec_t N and add closest_dist to dist at the intersection point only if closest_obj is not NULL (that is, hit and N get set to closest_hit and closest_N, respectively).
  9. finally, return closest_obj
  The find_closest function in its logic should resemble a typical "find the minimum value in an array" problem. We're just setting dist, hit, and N (instead of a single value) and doing so for the closest object (instead of the minimum).
3. rewrite all the objects' hits() routines so that they overwrite the incoming hit point and normal pass-by-reference arguments—the objects' (internal) normal must not be altered, and no last_hit should be stored by the object (this should be removed from the object_t class).
  For the plane, it will just set the incoming argument to the plane's normal. For the sphere, the incoming argument will be set to the sphere's normal at the intersection hit point (N = 1.0/radius * (hit - center)); these variables are handled in a similar manner to dist, the distance to the closest hit point from the ray's current position, meaning that they are all pass-by-reference
Compare the timing of the old serial approach vs. the parallel one—is there a benefit? (Simply comment out the #pragma omp directive to run the serial version (once your parallelized version works); on my dual-core laptop, the parallel version takes 1.517 seconds, while the serial version takes 2.934 seconds, not quite a factor of 2 reduction, but 1.9, which is noticeable. On a quad-core machine, the speedup should be even better.
As with the previous assignment, the image generated should be identical:

Hints

Shared or private? The idea is variables that are meant to change indepentenly should be made private. As an example, the ray and color should be private as each independent process has its own version of these variables. What about the model? Should there be many copies of these, or just one? Just the one...so it should be shared. Is there one img or many? Just one, so it too should be shared. What about w and h, the image dimensions? Do these change per process or do they stay the same?

In general, any time you need mutual exclusivity, you need privacy. But, the img is shared because access to its pixels is (must be) made exclusive, i.e., no two processes should write to the same pixel.

Here are all the relevant (to ray tracing) variable declarations:

        model_t        model;                   // model (the world)
        int            tid,ncores=1;            // thread id, no. of cores
        int            w=model.getpixel_w();    // image width (screen coords)
	int	       h=model.getpixel_h();    // image height (screen coords)
        int            chunk;                   // no. rows per thread (core)
        double         wx,wy,wz=0.0;            // pixel in world coords
        double         ww=model.getworld_w();   // world width (world coords)
	double	       wh=model.getworld_h();   // world height (world coords)
        vec_t          pos=model.getviewpoint();// camara (ray) position
        vec_t          pix,dir;                 // pixel pos (world), ray dir
        ray_t          *ray=NULL;               // a ray
        rgb_t<double>  color;                   // color set by ray
        rgb_t<uchar>   *imgloc,*img=NULL;       // image and image location ptr

Turn in

Turn in all of your code, in one tar.gz archive of your asg##/ directory, including:

A README file containing
1. Course id--section no
2. Name
3. Lab description
4. Brief solution description (e.g., program design, description of algorithm, etc., however appropriate).
5. Lessons learned, identified interesting features of your program
6. Any special usage instructions
Makefile
source code (.h headers and .cpp source)
~~object code~~ (do a make clean before tar)

How to hand in

See handin notes