omp compiler directives
-fopenmp to the CFLAGS
line in the Makefile
tid and ncores
in main() find out how many cores there at your
disposal via this piece of code:
#pragma omp parallel private(tid)
{
if((tid = omp_get_thread_num())==0)
ncores = omp_get_num_threads();
}
chunk to the number of
rows (height of the image to be ray traced) divided by
the number of cores—this stipulates how many
rows of the image each core will process—we parallelize
the ray tracer by chopping up the image into horizontal
bands, and having each core process each band of pixels
simultaneously
for loops iterating over the image pixels
(x,y):
#pragma omp parallel for \
shared(...fill this in...) \
private(...fill this in...) \
schedule(static,chunk)
one important aspect of this assignment is to properly
figure out which variables are meant to be shared among
parallel processes and which are meant to only be manipulated
privately
img,
an array of (pointer to) type
rgb_t<unsigned char>,
allocated to size w*h and that the code
inside the loops does not write to std::cout
directly—provide a second doubly-nested loop to
ouptut the pixels after the parallelized nested
loops finish
obj->getlast_hit()
This suggests that an object stores the surface point (and normal)
of the last ray that hit it. When there was only one ray at a time,
querying the object for this piece of info made sense. Now, however,
this design strategy leads to a race condition: in between
the time that some ray hit an object, another ray came along and
hit it again. The first ray, when querying the object for the hit
point, gets the second ray's location. This leads to rather nasty
color banding problems. To fix this:
ray_t::trace() so that it no longer needs
and object_t *last_hit argument
model_t::find_closest() routine won't work
anymore because its initial design assumed that a ray would
hit any given object only once (and reflect). This won't
work when there are two (or more) rays hitting an object at
the same time or when transparent objects have volume,
through which the transparent ray needs to exit, once it
has penetrated it—see Asg 2.
find_closest() routine needs to be rewritten:
mindist to
closest_dist and set it to
INFINITY instead of to -1
obj to c_obj
and rename the variable closest to
closest_obj
dist to
c_dist and rename the lcoal variable
mindist to closest_dist
find_closest
you now have four vec_t pass-by-reference
variables pos, dir,
hit, and N and one
double dist
obj==last_obj || evaluation from
the initial call to obj->hits()—the
current object should only be skipped if the returned
dist < 0
(0.00001 < c_dist) &&
(c_dist < closest_dist)
(the distance to the intersection point must be
non-zero) making sure that you called
obj->hits with arguments
pos, dir, c_hit
and c_N (note that hits
will have to be rewritten; see below)
closest_dist, closest_obj,
closest_hit, and closest_N
all get set to c_dist, c_obj,
c_hit, and c_N, respectively,
if the above if statement is
true
model::find_closest() must now
overwrite the incoming arguments
vec_t hit and vec_t N
and add closest_dist to dist
at the intersection point only if
closest_obj is not NULL
(that is, hit and N
get set to closest_hit and
closest_N, respectively).
closest_obj
find_closest function in its logic should
resemble a typical "find the minimum value in an array"
problem. We're just setting dist,
hit, and N (instead of a
single value) and doing so for the closest object
(instead of the minimum).
hits() routines so that they
overwrite the incoming hit point and normal pass-by-reference
arguments—the objects' (internal) normal must not be
altered, and no last_hit should be stored by the
object (this should be removed from the object_t
class).
N = 1.0/radius * (hit - center)); these variables
are handled in a similar manner to dist, the
distance to the closest hit point from the ray's current
position, meaning that they are all pass-by-reference
#pragma omp
directive to run the serial version (once your parallelized version
works); on my dual-core laptop, the parallel version takes 1.517
seconds, while the serial version takes 2.934 seconds, not quite
a factor of 2 reduction, but 1.9, which is noticeable. On a
quad-core machine, the speedup should be even better.
ray and color should be private as each
independent process has its own version of these variables.
What about the model? Should there be many copies
of these, or just one? Just the one...so it should be shared.
Is there one img or many? Just one, so it too should
be shared. What about w and h, the image
dimensions? Do these change per process or do they stay the same?
In general, any time you need mutual exclusivity, you need privacy.
But, the img is shared because access to its pixels
is (must be) made exclusive, i.e., no two processes should write to
the same pixel.
Here are all the relevant (to ray tracing) variable declarations:
model_t model; // model (the world)
int tid,ncores=1; // thread id, no. of cores
int w=model.getpixel_w(); // image width (screen coords)
int h=model.getpixel_h(); // image height (screen coords)
int chunk; // no. rows per thread (core)
double wx,wy,wz=0.0; // pixel in world coords
double ww=model.getworld_w(); // world width (world coords)
double wh=model.getworld_h(); // world height (world coords)
vec_t pos=model.getviewpoint();// camara (ray) position
vec_t pix,dir; // pixel pos (world), ray dir
ray_t *ray=NULL; // a ray
rgb_t<double> color; // color set by ray
rgb_t<uchar> *imgloc,*img=NULL; // image and image location ptr
tar.gz
archive of your asg##/ directory, including:
README file containing
Makefile
.h headers and .cpp source)
make clean before tar)
handin notes