SMPOV 4.x -
Some Benchmarks and some ideas on parallel rendering with POV-Ray
and SMPOV.
|
|
|
SMPOV 4.x Benchmarks
- what Hardware was used for testing?
- SMPOV-Main runs on an Dual
Athlon MP 2000+
- on the same machine there is a RenderAgent
with 2 CPU's enabled
- then there is a 2.5 Ghz P-IV
Laptop
- and finally a Athlon XP 1800+
- all of them connected via 100 MBit
LAN
All test run under W2k or Win XP. All machines
were partly in use for other things.
Copyright notice:
Sometimes people ask if they can use pictures or text from this page for illustrions.
You can use pictures or text from this site for your illustrations under this condition:
1. If you use it in the Internet, make a link to the original site somewhere on your site.
2. if you use it for material outside the internet there are no restrictions.
3. If you are not shure, just sent me an e-mail, and you'll get a written permission for your case.
|
|
Single
CPU-Render benchmark.pov
2971 Sek. (DUAL A 2000+
- Single CPU)
3125 Sek. (A 1800+, single CPU)
3958 Sek. (Laptop P-IV, single CPU)
------------------------------------------------------
1019 Sek. (All CPU's together, using SMPOV)*
*benchmark.pov, no AA at
800x600, 8 tiles (S) |
|
To
get an idea of each single CPU, lets see how each of them renders "benchmark.pov" at 800x600 without AA as a single picture,
no tiling. All renderings werer distributed with SMPOV, timing
was taken out of the "timing.txt". Please note that
PC 1 was running SMPOV-Main, however only one CPU was used for
the rendering. |
|
Preprocessing
time (photons etc.)
100 Sek. (single Athlon
MP 2000+)
66 Sek. (P-IV-Laptop) |
|
While this test I took
the time for the POV-Ray preprocessing (Includes everything before
the counter starts with "Line 1 ...". The A 2000+ was
at 100 Sek., while the P-IV Laptop was ready at 66 Sek. This "preprocessing-time" time is important to know for parallel distributed rendering,
as it cannot be distributed or parallelized. Every tile must go
through this preprocessing and no matter how many CPU's you feed,
the preprocessing time is one limiting factor.. |
|
Latency
times example for using SMPOV (Overhead)
SMPOV single Render: 200
Sek.*
POV 3.5 single Render 185 Sek.*
Latency time in this case: 15 Sek.
*benchmark.pov, 160x120
no AA, A 2000+ single CPU
|
|
Latency times
when using SMPOV (Overhead)
The next thing we should know when using SMPOV are the Latency
times. SMPOV includes a small pipeline of programs that work together
to be able to produce POV-Ray pictures. The following Latencies
apply:
- "Inter-Program-Communication" (between SMPOV - RenderAgent - PicPender)
- network and file-system
- every loading and starting of POV-Ray
Think a moment what happens:
- If you use SMPOV, SMPOV talks to
RenderAgent.
- RenderAgent starts a new instance
of POV-Ray (at least one) and waits till this ends.
- After POV-Ray is ready, RenderAgents
talks to SMPOV ("Job is done!").
- Then SMPOV talks to PicPender. ("Please
put my tiles together etc.")
- After all PicPender tells SMPOV "the
Pic is ready!"
- At resolutions higher then 4096 dependent
on tiling, there may be additional Latency for the CPU-PicPender (PicPend-2.exe).
The whole process is organized like a production
pipeline. All of the programs check every few seconds "Is
there a Job for me?". "Is the Job done?".
This can't be done too excessively since it should not waste a
significant amount of CPU-time itself. At last, the time POV-Ray
needs for loading and starting is included in the 200 Sek., while
it is missing in the 185 Sek. You don't always load and start
POV-Ray normally.The picture on the
left side shows:
- green = Preprocessing time
- blue = Overhead
- yellow = Line-rendering time
- PicPending-time
The picture shows, that the only
really parallel part is the "line-rendering". This is
the part where we can save time. |
|
Here
we don't save much time ...
172 Sek. in 2 Tiles, 160x120
*
185 Sek. Standard POV-Ray,
* benchmark.pov in 2 Tiles
(V) Dual Athlon MP 2000+ |
|
So taking this into
calculation we oversea two cases, where we won't save much time
using SMPOV, that is:
- for small pictures and single pictures
with low computing time and high preprocessing time, where
the time saving is smaler then the SMPOV-overhead..
- for very high-resolution pictures
(>4096x4096) with low calculation time, when the CPU-PicPender
and the copying of the tiles take a lot of time.
For the example on the left side, lets make
a small calculation to see where tthe time has gone:
- 100 Sek. Preprocessing for either
tile
- 22 Sek. Overhead (2 Tiles)
- 50 Sek. "pure Line-Rendering
time" compared to 85 for single CPU
If we watch the rendering process carefully,
we note that one of the two parts is much faster then the second
part, so 1 CPU is lazy for quite some time. |
|
...
unless we have more CPU's
129 Sek. in 2 Tiles, 160x120
*
* benchmark.pov in 160x120
in 4 Tiles (S) with 4 CPU's.
|
|
The only chance to
get such small pictures faster is pure power. So the P IV-Laptop
and the Athlon XP 1800+ were also rendering each 1 tile. Please
note that the exact numbers depend in this configuration on the
random factor, "which CPU takes which tile".If
the picture would have much longer rendertimes, we could solve
this problem by increasing the number of tiles, but in this case
we know that each tile will need 100 Sek. of preprocessing time,
and thats much longer then even all of the "Pure-Line-rendering-time" so we can't get any faster then with one piece for each CPU. |
|
Where
we are going to save time ...
2971 Sek. 1 Tile (single
CPU)*
2007 Sek. in 2 Tiles (H) (Dual A2000+)*
1342 Sek. in 4 Tiles (S)
1019 Sek. in 8 Tiles (S)
*benchmark.pov rendered
in 800x600 DUAL A 2000+ |
|
Here we are going to
save time. Because:
- The "preprocessing-time is small
compared to the absolut render time
- We can use all CPU's efficiently and
make a lot of tiles
- the resolution is low, so the PicPending
time is hardly over 1 Sek.
The diffrence between the 4 tiles and the
8 tiles is just a result of the fact that maybe the Laptop got
the "hardest nut" and was not ready while all other
CPU's were long time ready with their rendering. The smaller the
preprocessing/loading time compared to the "Line-rendering
time" the more tiles we can make. The more tiles we make,
the less CPU-time is wasted with "waiting till the last tile
is ready". However, every tile has overhead, that includes
the preprocessing-time as well as for the PicPending. |
|
|
|
How about dropping
more pics into SMPOV?
While SMPOV is
rendering, you can currently not drop more files into
the SMPOV-Render-Que. This may change in later versions. However,
you can drop as many files as you want at once
into SMPOV. Then another intresting effect happens.
The more pictures in the que, the less tiles you should do. Rendering
one picture with each CPU is really efficient. Tiling means Overhead.
the ideal case would be:
- the sme number of CPU's and
- the same number of pictures with
- all the same rendering time
In this case you would set "Tiling" to 1 (not use any tiling), but simply use SMPOV to distribute
your files to the CPU's. Using this system, you also have no problems
with Mech-Sims or with radiosity, as all pictures will be rendered
in one piece.
Q: What would we do,
if one of the pictures has a smaller rendering time then the 3
other?
A: In this case we would have to tile one of the other pictures
in for example 4 pieces. This way
we could still keep all the
CPU's busy till end.To understand
this effect, think of how SMPOV works with you files. SMPOV makes "Render-Jobs" out of your files. All of these "Render-Jobs"
stay in the "COM\JOB" Folder so long till a RenderAgent
takes thm and solves them. The effect of this system is, that SMPOV does not wait till picture A is ready before
sending out picture B. Each RenderAgent takes Renderjobs
when they are there. They do not look for which picture these
render-Jobs are. Therefore CPU 1 may still be rendering picture
A, while CPU 2 is already rendering picture B or anything else.
As you cannot say which Render-Jobs the RenderAgents take, the
CPU-utilisation is always 100% so long till all render-Jobs are
gone. Many pictures in que = higher
CPU utilization/efficiency. with less tiling. |
|
Time
needed for PicPender I
2 Sek.
irid.pov in 1600x1200 in 8 Tiles (2,4)-(H)*
Total PicPender-Data: 49 MB
8 Sek.
irid.pov in 3200x2400 in 8 Tiles (2,4)-(H)*.
Total PicPender-Data: 197 MB.
13 Sek.
irid.pov in 3200x2400 in 16 Tiles (8,2)-(S)*
Total PicPender-data: 373 MB
14 Sek.
irid.pov in 3200x2400 in 16 Tiles (2,8)-(S)*
Total PicPender-data: 373 MB
15 Sek.
irid.pov in 320x200 in 500 Tiles (25,20)-(S)*
Total PicPender-Data: 91 MB
94 Sek.
irid.pov in 1600x1200 in 400 Tiles (20,20)-(S)*
Total PicPender-Data: 2.2 GB
177 Sek.
woodbox.pov in 1600x1200 in 625
Tiles (25,25)-(S)*
Total PicPender-Data: 3.5 GB
* given time is only the
time for appending the tiles.
|
|
How much time
does the PicPender-1.exe need and for what?To
find out about that, the newest cersion of SMPOV have a more detailed
printout in "times.log". It shows exactly the amount
of time that was used for the PicPender. PicPender.exe. PicPender.exe
has two Jobs:
- PicPender.exe takes thepicture-tiles from
the CPU's and makes them together to a full picture.
- PicPender-1.exe copies the resulting
picture to its final location, that is the place where the
original-file came from.
- It does some file-delete operations
as well.
The most important part is, to put the tiles
to one picture together. To understand that more clearly, we take
six examples (left side).
What we can say for PicPender-1.exe is, that:
- its no diffrence between vertical
or horinzontal tiling,
- the amount of time comes from the
number of tiles
- and from the total amount of data
to move around.
- also we can say that its capable of
appending even a large number of tiles in short time.
- while watching the CPU utilisation
for PicPender-1.exe, you'll see that the speed-limitation
comes from harddisk-speed, rather then from CPU. I had 20%-40%
CPU Usage while PicPender-1.exe was running.
So long we can using PicPender-1.exe, we
do not need to worry about the time for appending/copying/deleting.
Currently the maximum allowed number of tiles is 625. |
|
Example times for PicPender-2.exe
3 Sek.
irid.pov in 3200x2400 in 2 Tiles (1,2)-(V)*
Total PicPender-Data: 65MB.
19 Sek. irid.pov in 3200x2400 in 2 Tiles (2,1)-(H)*
Total PicPender-Data: 65MB.
19 Sek.
irid.pov in 3200x2400 in 8 Tiles (2,4)-(S)*
Total PicPender-Data: 197 MB.
20 Sek.
irid.pov in 3200x2400 in 16 Tiles (2,8)-(S)*
Total PicPender-data: 373 MB
120 Sek. irid.pov in 3200x2400 in 16
Tiles (8,2)-(S)*
Total PicPender-data: 373 MB
3 Sek.
irid.pov in 3200x2400 in 16 Tiles (1,16)-(V)*
Total PicPender-data: 373 MB
255 Sek.
irid.pov in 3200x2400 in 16 Tiles (16,1)-(H)*
Total PicPender-data: 373 MB
3 Sek. irid.pov in 3200x2400 in 25 Tiles
(1,25)-(V)*
Total Data for PicPender: 571 MB.
68 Sek.
irid.pov in 9600x9600 in 10 Tiles (1,10)-(V)*
Total Data for PicPender: 2.9 GB (= 42 MB/Sek.!)
130 Sek. irid.pov in 12288x12288 in 4
Tiles (1,4)-(V)*
Total data for PicPender: 2.1 GB
(=16.1 MB/Sek. Size of one picture = 442 MB!)
|
|
What is diffrent
for PicPender-2.exe?
First we should say, that
PicPender-2.exe uses CPU-Power and avoids using the Graphics-Hardware.
Therefore it takes much longer for these type of operation. Also
it has some internal limitations, which may cause some combinations
of resolution/tiling to result in unusable pictures.However, we
use PicPender-2.exe for all Hi-Resolution pictures, where PicPender-1.exe
cannot append the pictures for whatever reason. This is the case
after a certain resolution, in my tests is somewhere above 2048
or 4096. The value, when PicPender-2.exe will be used instead
of PicPender-1.exe is taken from the third line of the "private.ini"-file.To
test PicPender-2.exe with the same resolutions as before PicPender-1.exe
we will just reduce this value and try to do the same kind of
operation. What we can say about PicPender-2.exe is that:
- not all combinations between tiling
and resolution are valid
- its generally slower then PicPender-1.exe,
sometimes only 5%, sometimes significantly
- horizontal tiling at resolutions above 4095 slows things significantly
down,
- while vertical tiling doesn't at all
- therefore when rendering in high resolutions above 4095x4095 prefer vertical tiling (=horizoontal stripes)..
- its much more CPU dependant.
- at high resolutions, where the Total
Data is high and you have vertical tiling, the speed is mainly
dependend on harddisk-speed.rather then on CPU-speed.
In Example 3 the time for appending the
tiles would be longer then the actual total rendering time from
example 2. So for high resolutions, the tiling may make a diffrence,
depending on the amount of computing time needed for the "Line-rendering".
Try wherever possible to use vertical tiling (=horizontal-stripes
:-).* given time is only the time
for appending the tiles.
Please note: default at 3200x2400 (resolution) PicPender-1.exe is used not PicPender-2.exe.
We have changed that manually to PicPender-2.exe for this test.
|
|
Picture 1: benchmark1_04292006_080658_46756_3_0x60-80x60
Picture 2: benchmark1_04292006_080658_46756_4_80x60-80x60A
Picture 3: benchmark1_04292006_080658_46756_1_0x0-80x60A
Picture 4:
benchmark1_04292006_080658_46756_2_80x0-80x60A
The 4 tiles how they come out of POV-Ray 3.6 in 4 tiles (S)-Mode.
The filenames from SMPOV show PicPender how to connet the pictures.
Here is the full picture.
3200x2400*3/1024=22.5
MB (single pic)
22.5 MB * (25+1) = 571 MB (Total PicPender-Data)
|
|
How is the
amount of "total PicPender data" calculated and why
is it so much?
Any tile that
is rendered by POV-Ray produces a full size picture, at least in POV 3.6.
just that
the tile is coloured, while the rest is black. Therefore the total
space on your harddisk, that is temporary use, as well as the
amount of data that has to go "through PicPender.exe" is the full size for a picture times the number of tiles plus
the resulting picture. Example:3200x2400
(resolution in 24 bit) *3 (now we make it to killobyte) /1024=22.5
MB
Thats the size of one picture on your harddisk. What we will have
at the end is one for each tile plus the resulting-picture.
Therefore the total data amount is: 22.5 MB*(tilescount +1). In
case of 25 Tiles, thats 571 MB. You ned to know that, so you have
enough free space on your harddisk! In reality there are optimizations
builtin that prevent PicPender from moving all the data really
physically through the PCI-bus. Take
a look at the examples on the left side and be shure that you
understood, that while rendering in Hi-Resolutions and therefore
using PicPender-2.exe, the horizontal tiling will limit the process
speed by your avaibale CPU-Speed. On the otehr side, vertical
tiling is only limited by the speed of your harddisk. Both tilings
will be limited by the size of your hardisk.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|