SMPOV 4.x - Some Benchmarks and some ideas on parallel rendering with POV-Ray and SMPOV.

SMPOV 4.x Benchmarks - what Hardware was used for testing?

  • SMPOV-Main runs on an Dual Athlon MP 2000+
  • on the same machine there is a RenderAgent with 2 CPU's enabled
  • then there is a 2.5 Ghz P-IV Laptop
  • and finally a Athlon XP 1800+
  • all of them connected via 100 MBit LAN

All test run under W2k or Win XP. All machines were partly in use for other things.

 

 

Copyright notice:
Sometimes people ask if they can use pictures or text from this page for illustrions.
You can use pictures or text from this site for your illustrations under this condition:
1. If you use it in the Internet, make a link to the original site somewhere on your site.
2. if you use it for material outside the internet there are no restrictions.
3. If you are not shure, just sent me an e-mail, and you'll get a written permission for your case.

Single CPU-Render benchmark.pov

2971 Sek. (DUAL A 2000+ - Single CPU)
3125 Sek. (A 1800+, single CPU)
3958 Sek. (Laptop P-IV, single CPU)
------------------------------------------------------
1019 Sek. (All CPU's together, using SMPOV)*

*benchmark.pov, no AA at 800x600, 8 tiles (S)

To get an idea of each single CPU, lets see how each of them renders "benchmark.pov" at 800x600 without AA as a single picture, no tiling. All renderings werer distributed with SMPOV, timing was taken out of the "timing.txt". Please note that PC 1 was running SMPOV-Main, however only one CPU was used for the rendering.

Preprocessing time (photons etc.)

100 Sek. (single Athlon MP 2000+)
66 Sek. (P-IV-Laptop)

While this test I took the time for the POV-Ray preprocessing (Includes everything before the counter starts with "Line 1 ...". The A 2000+ was at 100 Sek., while the P-IV Laptop was ready at 66 Sek. This "preprocessing-time" time is important to know for parallel distributed rendering, as it cannot be distributed or parallelized. Every tile must go through this preprocessing and no matter how many CPU's you feed, the preprocessing time is one limiting factor..

Latency times example for using SMPOV (Overhead)

SMPOV single Render: 200 Sek.*
POV 3.5 single Render 185 Sek.*
Latency time in this case: 15 Sek.

*benchmark.pov, 160x120 no AA, A 2000+ single CPU

Latency times when using SMPOV (Overhead)
The next thing we should know when using SMPOV are the Latency times. SMPOV includes a small pipeline of programs that work together to be able to produce POV-Ray pictures. The following Latencies apply:
  • "Inter-Program-Communication" (between SMPOV - RenderAgent - PicPender)
  • network and file-system
  • every loading and starting of POV-Ray
Think a moment what happens:
  1. If you use SMPOV, SMPOV talks to RenderAgent.
  2. RenderAgent starts a new instance of POV-Ray (at least one) and waits till this ends.
  3. After POV-Ray is ready, RenderAgents talks to SMPOV ("Job is done!").
  4. Then SMPOV talks to PicPender. ("Please put my tiles together etc.")
  5. After all PicPender tells SMPOV "the Pic is ready!"
  6. At resolutions higher then 4096 dependent on tiling, there may be additional Latency for the CPU-PicPender (PicPend-2.exe).
The whole process is organized like a production pipeline. All of the programs check every few seconds "Is there a Job for me?". "Is the Job done?".
This can't be done too excessively since it should not waste a significant amount of CPU-time itself. At last, the time POV-Ray needs for loading and starting is included in the 200 Sek., while it is missing in the 185 Sek. You don't always load and start POV-Ray normally.The picture on the left side shows:
  • green = Preprocessing time
  • blue = Overhead
  • yellow = Line-rendering time
  • PicPending-time
The picture shows, that the only really parallel part is the "line-rendering". This is the part where we can save time.

Here we don't save much time ...

172 Sek. in 2 Tiles, 160x120 *
185 Sek. Standard POV-Ray,

* benchmark.pov in 2 Tiles (V) Dual Athlon MP 2000+

So taking this into calculation we oversea two cases, where we won't save much time using SMPOV, that is:
  1. for small pictures and single pictures with low computing time and high preprocessing time, where the time saving is smaler then the SMPOV-overhead..
  2. for very high-resolution pictures (>4096x4096) with low calculation time, when the CPU-PicPender and the copying of the tiles take a lot of time.
For the example on the left side, lets make a small calculation to see where tthe time has gone:
  • 100 Sek. Preprocessing for either tile
  • 22 Sek. Overhead (2 Tiles)
  • 50 Sek. "pure Line-Rendering time" compared to 85 for single CPU
If we watch the rendering process carefully, we note that one of the two parts is much faster then the second part, so 1 CPU is lazy for quite some time.

... unless we have more CPU's

129 Sek. in 2 Tiles, 160x120 *

* benchmark.pov in 160x120 in 4 Tiles (S) with 4 CPU's.

 

The only chance to get such small pictures faster is pure power. So the P IV-Laptop and the Athlon XP 1800+ were also rendering each 1 tile. Please note that the exact numbers depend in this configuration on the random factor, "which CPU takes which tile".If the picture would have much longer rendertimes, we could solve this problem by increasing the number of tiles, but in this case we know that each tile will need 100 Sek. of preprocessing time, and thats much longer then even all of the "Pure-Line-rendering-time" so we can't get any faster then with one piece for each CPU.

Where we are going to save time ...

2971 Sek. 1 Tile (single CPU)*
2007 Sek. in 2 Tiles (H) (Dual A2000+)*
1342 Sek. in 4 Tiles (S)
1019 Sek. in 8 Tiles (S)

*benchmark.pov rendered in 800x600 DUAL A 2000+

Here we are going to save time. Because:
  1. The "preprocessing-time is small compared to the absolut render time
  2. We can use all CPU's efficiently and make a lot of tiles
  3. the resolution is low, so the PicPending time is hardly over 1 Sek.
The diffrence between the 4 tiles and the 8 tiles is just a result of the fact that maybe the Laptop got the "hardest nut" and was not ready while all other CPU's were long time ready with their rendering. The smaller the preprocessing/loading time compared to the "Line-rendering time" the more tiles we can make. The more tiles we make, the less CPU-time is wasted with "waiting till the last tile is ready". However, every tile has overhead, that includes the preprocessing-time as well as for the PicPending.

How about dropping more pics into SMPOV?

While SMPOV is rendering, you can currently not drop more files into the SMPOV-Render-Que. This may change in later versions. However, you can drop as many files as you want at once into SMPOV. Then another intresting effect happens.
The more pictures in the que, the less tiles you should do. Rendering one picture with each CPU is really efficient. Tiling means Overhead. the ideal case would be:

  • the sme number of CPU's and
  • the same number of pictures with
  • all the same rendering time
In this case you would set "Tiling" to 1 (not use any tiling), but simply use SMPOV to distribute your files to the CPU's. Using this system, you also have no problems with Mech-Sims or with radiosity, as all pictures will be rendered in one piece.

Q: What would we do, if one of the pictures has a smaller rendering time then the 3 other?
A: In this case we would have to tile one of the other pictures in for example 4 pieces. This way we could still keep all the CPU's busy till end.To understand this effect, think of how SMPOV works with you files. SMPOV makes "Render-Jobs" out of your files. All of these "Render-Jobs" stay in the "COM\JOB" Folder so long till a RenderAgent takes thm and solves them. The effect of this system is, that SMPOV does not wait till picture A is ready before sending out picture B. Each RenderAgent takes Renderjobs when they are there. They do not look for which picture these render-Jobs are. Therefore CPU 1 may still be rendering picture A, while CPU 2 is already rendering picture B or anything else. As you cannot say which Render-Jobs the RenderAgents take, the CPU-utilisation is always 100% so long till all render-Jobs are gone. Many pictures in que = higher CPU utilization/efficiency. with less tiling.

Time needed for PicPender I

2 Sek. irid.pov in 1600x1200 in 8 Tiles (2,4)-(H)*
Total PicPender-Data: 49 MB

8 Sek. irid.pov in 3200x2400 in 8 Tiles (2,4)-(H)*.
Total PicPender-Data: 197 MB.

13 Sek. irid.pov in 3200x2400 in 16 Tiles (8,2)-(S)*
Total PicPender-data: 373 MB

14 Sek. irid.pov in 3200x2400 in 16 Tiles (2,8)-(S)*
Total PicPender-data: 373 MB

15 Sek. irid.pov in 320x200 in 500 Tiles (25,20)-(S)*
Total PicPender-Data: 91 MB

94 Sek. irid.pov in 1600x1200 in 400 Tiles (20,20)-(S)*
Total PicPender-Data: 2.2 GB

177 Sek. woodbox.pov in 1600x1200 in 625 Tiles (25,25)-(S)*
Total PicPender-Data: 3.5 GB

* given time is only the time for appending the tiles.

How much time does the PicPender-1.exe need and for what?To find out about that, the newest cersion of SMPOV have a more detailed printout in "times.log". It shows exactly the amount of time that was used for the PicPender.PicPender.exe. PicPender.exe has two Jobs:
  1. PicPender.exe takes thepicture-tiles from the CPU's and makes them together to a full picture.
  2. PicPender-1.exe copies the resulting picture to its final location, that is the place where the original-file came from.
  3. It does some file-delete operations as well.
The most important part is, to put the tiles to one picture together. To understand that more clearly, we take six examples (left side).

What we can say for PicPender-1.exe is, that:
  1. its no diffrence between vertical or horinzontal tiling,
  2. the amount of time comes from the number of tiles
  3. and from the total amount of data to move around.
  4. also we can say that its capable of appending even a large number of tiles in short time.
  5. while watching the CPU utilisation for PicPender-1.exe, you'll see that the speed-limitation comes from harddisk-speed, rather then from CPU. I had 20%-40% CPU Usage while PicPender-1.exe was running.
So long we can using PicPender-1.exe, we do not need to worry about the time for appending/copying/deleting. Currently the maximum allowed number of tiles is 625.


Example times for PicPender-2.exe

3 Sek. irid.pov in 3200x2400 in 2 Tiles (1,2)-(V)*
Total PicPender-Data: 65MB.

19 Sek. irid.pov in 3200x2400 in 2 Tiles (2,1)-(H)*
Total PicPender-Data: 65MB.

19 Sek. irid.pov in 3200x2400 in 8 Tiles (2,4)-(S)*
Total PicPender-Data: 197 MB.

20 Sek. irid.pov in 3200x2400 in 16 Tiles (2,8)-(S)*
Total PicPender-data: 373 MB

120 Sek. irid.pov in 3200x2400 in 16 Tiles (8,2)-(S)*
Total PicPender-data: 373 MB

3 Sek. irid.pov in 3200x2400 in 16 Tiles (1,16)-(V)*
Total PicPender-data: 373 MB

255 Sek. irid.pov in 3200x2400 in 16 Tiles (16,1)-(H)*
Total PicPender-data: 373 MB

3 Sek. irid.pov in 3200x2400 in 25 Tiles (1,25)-(V)*
Total Data for PicPender: 571 MB.

68 Sek. irid.pov in 9600x9600 in 10 Tiles (1,10)-(V)*
Total Data for PicPender: 2.9 GB (= 42 MB/Sek.!)

130 Sek. irid.pov in 12288x12288 in 4 Tiles (1,4)-(V)*
Total data for PicPender: 2.1 GB
(=16.1 MB/Sek. Size of one picture = 442 MB!)


What is diffrent for PicPender-2.exe?

First we should say, that PicPender-2.exe uses CPU-Power and avoids using the Graphics-Hardware. Therefore it takes much longer for these type of operation. Also it has some internal limitations, which may cause some combinations of resolution/tiling to result in unusable pictures.However, we use PicPender-2.exe for all Hi-Resolution pictures, where PicPender-1.exe cannot append the pictures for whatever reason. This is the case after a certain resolution, in my tests is somewhere above 2048 or 4096. The value, when PicPender-2.exe will be used instead of PicPender-1.exe is taken from the third line of the "private.ini"-file.To test PicPender-2.exe with the same resolutions as before PicPender-1.exe we will just reduce this value and try to do the same kind of operation. What we can say about PicPender-2.exe is that:

  • not all combinations between tiling and resolution are valid
  • its generally slower then PicPender-1.exe, sometimes only 5%, sometimes significantly
  • horizontal tiling at resolutions above 4095 slows things significantly down,
  • while vertical tiling doesn't at all
  • therefore when rendering in high resolutions above 4095x4095 prefer vertical tiling (=horizoontal stripes)..
  • its much more CPU dependant.
  • at high resolutions, where the Total Data is high and you have vertical tiling, the speed is mainly dependend on harddisk-speed.rather then on CPU-speed.

In Example 3 the time for appending the tiles would be longer then the actual total rendering time from example 2. So for high resolutions, the tiling may make a diffrence, depending on the amount of computing time needed for the "Line-rendering". Try wherever possible to use vertical tiling (=horizontal-stripes :-).* given time is only the time for appending the tiles.

Please note: default at 3200x2400 (resolution) PicPender-1.exe is used not PicPender-2.exe. We have changed that manually to PicPender-2.exe for this test.

 


Picture 1: benchmark1_04292006_080658_46756_3_0x60-80x60
Picture 2: benchmark1_04292006_080658_46756_4_80x60-80x60A



Picture 3: benchmark1_04292006_080658_46756_1_0x0-80x60A
Picture 4: benchmark1_04292006_080658_46756_2_80x0-80x60A


The 4 tiles how they come out of POV-Ray 3.6 in 4 tiles (S)-Mode.
The filenames from SMPOV show PicPender how to connet the pictures.


Here is the full picture.

3200x2400*3/1024=22.5 MB (single pic)
22.5 MB * (25+1) = 571 MB (Total PicPender-Data)

How is the amount of "total PicPender data" calculated and why is it so much?

Any tile that is rendered by POV-Ray produces a full size picture, at least in POV 3.6.

just that the tile is coloured, while the rest is black. Therefore the total space on your harddisk, that is temporary use, as well as the amount of data that has to go "through PicPender.exe" is the full size for a picture times the number of tiles plus the resulting picture. Example:3200x2400 (resolution in 24 bit) *3 (now we make it to killobyte) /1024=22.5 MB
Thats the size of one picture on your harddisk. What we will have at the end is one for each tile plus the resulting-picture. Therefore the total data amount is: 22.5 MB*(tilescount +1). In case of 25 Tiles, thats 571 MB. You ned to know that, so you have enough free space on your harddisk! In reality there are optimizations builtin that prevent PicPender from moving all the data really physically through the PCI-bus. Take a look at the examples on the left side and be shure that you understood, that while rendering in Hi-Resolutions and therefore using PicPender-2.exe, the horizontal tiling will limit the process speed by your avaibale CPU-Speed. On the otehr side, vertical tiling is only limited by the speed of your harddisk. Both tilings will be limited by the size of your hardisk.

 
Hits | Theo Gottwald * Wolfartsweierer Str.1 * 76131 Karlsruhe | Telefon (07 21) 9 66 33-00 | Fax (07 21) 9 66 33-99 |Info@it-berater.org | Stand: 04/29/2006