omp
compiler directives
-fopenmp
to the CFLAGS
line in the Makefile
tid
and ncores
in main()
find out how many cores there at your
disposal via this piece of code:
#pragma omp parallel private(tid)
{
if((tid = omp_get_thread_num())==0)
ncores = omp_get_num_threads();
}
chunk
to the number of
rows (height of the image to be ray traced) divided by
the number of cores—this stipulates how many
rows of the image each core will process—we parallelize
the ray tracer by chopping up the image into horizontal
bands, and having each core process each band of pixels
simultaneously
for
loops iterating over the image pixels
(x
,y
):
#pragma omp parallel for \
shared(...fill this in...) \
private(...fill this in...) \
schedule(static,chunk)
one important aspect of this assignment is to properly
figure out which variables are meant to be shared among
parallel processes and which are meant to only be manipulated
privately
img
,
an array of (pointer to) type
rgb_t<unsigned char>
,
allocated to size w*h
and that the code
inside the loops does not write to std::cout
directly—provide a second doubly-nested loop to
ouptut the pixels after the parallelized nested
loops finish
obj->getlast_hit()
This suggests that an object stores the surface point (and normal)
of the last ray that hit it. When there was only one ray at a time,
querying the object for this piece of info made sense. Now, however,
this design strategy leads to a race condition: in between
the time that some ray hit an object, another ray came along and
hit it again. The first ray, when querying the object for the hit
point, gets the second ray's location. This leads to rather nasty
color banding problems. To fix this:
ray_t::trace()
so that it no longer needs
and object_t *last_hit
argument
model_t::find_closest()
routine won't work
anymore because its initial design assumed that a ray would
hit any given object only once (and reflect). This won't
work when there are two (or more) rays hitting an object at
the same time or when transparent objects have volume,
through which the transparent ray needs to exit, once it
has penetrated it—see Asg 2.
find_closest()
routine needs to be rewritten:
mindist
to
closest_dist
and set it to
INFINITY
instead of to -1
obj
to c_obj
and rename the variable closest
to
closest_obj
dist
to
c_dist
and rename the lcoal variable
mindist
to closest_dist
find_closest
you now have four vec_t
pass-by-reference
variables pos
, dir
,
hit
, and N
and one
double
dist
obj==last_obj ||
evaluation from
the initial call to obj->hits()
—the
current object should only be skipped if the returned
dist < 0
(0.00001 < c_dist) &&
(c_dist < closest_dist)
(the distance to the intersection point must be
non-zero) making sure that you called
obj->hits
with arguments
pos
, dir
, c_hit
and c_N
(note that hits
will have to be rewritten; see below)
closest_dist
, closest_obj
,
closest_hit
, and closest_N
all get set to c_dist
, c_obj
,
c_hit
, and c_N
, respectively,
if the above if
statement is
true
model::find_closest()
must now
overwrite the incoming arguments
vec_t hit
and vec_t N
and add closest_dist
to dist
at the intersection point only if
closest_obj
is not NULL
(that is, hit
and N
get set to closest_hit
and
closest_N
, respectively).
closest_obj
find_closest
function in its logic should
resemble a typical "find the minimum value in an array"
problem. We're just setting dist
,
hit
, and N
(instead of a
single value) and doing so for the closest object
(instead of the minimum).
hits()
routines so that they
overwrite the incoming hit point and normal pass-by-reference
arguments—the objects' (internal) normal must not be
altered, and no last_hit
should be stored by the
object (this should be removed from the object_t
class).
N = 1.0/radius * (hit - center)
); these variables
are handled in a similar manner to dist
, the
distance to the closest hit point from the ray's current
position, meaning that they are all pass-by-reference
#pragma omp
directive to run the serial version (once your parallelized version
works); on my dual-core laptop, the parallel version takes 1.517
seconds, while the serial version takes 2.934 seconds, not quite
a factor of 2 reduction, but 1.9, which is noticeable. On a
quad-core machine, the speedup should be even better.
ray
and color
should be private as each
independent process has its own version of these variables.
What about the model
? Should there be many copies
of these, or just one? Just the one...so it should be shared.
Is there one img
or many? Just one, so it too should
be shared. What about w
and h
, the image
dimensions? Do these change per process or do they stay the same?
In general, any time you need mutual exclusivity, you need privacy.
But, the img
is shared because access to its pixels
is (must be) made exclusive, i.e., no two processes should write to
the same pixel.
Here are all the relevant (to ray tracing) variable declarations:
model_t model; // model (the world)
int tid,ncores=1; // thread id, no. of cores
int w=model.getpixel_w(); // image width (screen coords)
int h=model.getpixel_h(); // image height (screen coords)
int chunk; // no. rows per thread (core)
double wx,wy,wz=0.0; // pixel in world coords
double ww=model.getworld_w(); // world width (world coords)
double wh=model.getworld_h(); // world height (world coords)
vec_t pos=model.getviewpoint();// camara (ray) position
vec_t pix,dir; // pixel pos (world), ray dir
ray_t *ray=NULL; // a ray
rgb_t<double> color; // color set by ray
rgb_t<uchar> *imgloc,*img=NULL; // image and image location ptr
tar.gz
archive of your asg##/ directory, including:
README
file containing
Makefile
.h
headers and .cpp
source)
make clean
before tar
)
handin
notes