Help Optimize Small Code Snippet 
Author Message
 Help Optimize Small Code Snippet

I'm trying to optimize (for speed) the following code snippet, which
prepares over a million small polygons for rendering. Before you start
asking (what?!), yes there are over a million individual data points
that 1.) have unique properties 2.) can change on the fly. Thus the
DrawPolygon function below is called repeatedly with new datasets.
Yes, using the Polygon API is fairly slow and I'm exploring OpenGL to
remove that lantency. Still, the actual determining of the vertex
placements on the screen is the purpose of the following function.

I'm running a new, pretty high end PC, and am seeing this function
take around .45 seconds to complete in the IDE and around .13 seconds
compiled (with optimizations selected).

This still seems quite slow for a 2.4 GHz Core 2 Duo, considering the
basic math involved. I was hoping that someone might see something
obvious that may help bring this down. The outer loop with the trig
functions actually runs quite fast (taking less than 1ms to complete
for all iterations), so there has to be something with the inner loop
that's causing the problem. My guess is that (at least partially) it's
the Long assigment to the POINTAPI array (as required for the Polygon
API) from the Single variables.

Thanks in advance!
wxforecaster

'REQUIREMENTS: Create a form with a listbox (list1)
'and a timer (timer1) whose interval is set to 10 to spawn this
function frequently.

'Timer APIs
Private Declare Function timeGetTime Lib "winmm.dll" () As Long
Private Declare Function timeBeginPeriod Lib "winmm.dll" (ByVal
uPeriod As Long) As Long
Private Declare Function timeEndPeriod Lib "winmm.dll" (ByVal uPeriod
As Long) As Long

Private Type POINTAPI
x As Long
y As Long
End Type

Dim bUnload As Boolean
Const Pi180 As Single = 0.0174533

Public Sub DrawPolygons()

Dim Start As Long
Dim i As Long, j As Long
Dim radarPoly(0 To 3) As POINTAPI

Dim jj As Single, cosaz As Single, sinaz As Single
Dim cosazplus As Single, sinazplus As Single

'Throw in some random approximate values that would be passed to this
function.
'Because these change, we cannot simply store the resultant polygons
in a
'permanant array. Hence the need to optimize the speed here as this
function will be called repeatedly.

Dim HalfWidth As Single: HalfWidth = 300.25
Dim HalfHeight As Single: HalfHeight = 301.13
Dim zoomscl As Single: zoomscl = 2#
Dim zoomwidth As Single: zoomwidth = 0.806667 * zoomscl
Dim zoomheight As Single: zoomheight = 0.8061111 * zoomscl
Dim vlft As Single: vlft = 89.90001 * zoomscl
Dim vtop As Single: vtop = 82.26666 * zoomscl
Dim sf As Single: sf = 0.25

Dim azang As Single, delang As Single

Dim oldx1 As Long, oldx2 As Long, oldy1 As Long, oldy2 As Long

'Create Lookup Table with bin distances
Dim bin_dist() As Single
ReDim bin_dist(1800)
For j = 0 To UBound(bin_dist)
    bin_dist(j) = 8 + j * sf
Next j

'Total Loop iterations = 720 (i) * 1800 (j) = 1296000
DoEvents
Start = timeGetTime

    'Run outer loop over 720 degrees and calculate some angular based
variables
    For i = 1 To 720

        azang = i / 2
        delang = azang + 0.49

        cosaz = Cos((450# - azang) * Pi180)
        sinaz = Sin((450# - azang) * Pi180)
        cosazplus = Cos((450# - (delang)) * Pi180)
        sinazplus = Sin((450# - (delang)) * Pi180)

            'Calculate Initial X,Y vertices before entering inner loop
            oldx1 = ((HalfWidth + (bin_dist(0) * cosaz)) * zoomwidth)
- vlft
            oldy1 = ((HalfHeight - (bin_dist(0) * sinaz)) *
zoomheight) - vtop

            oldx2 = ((HalfWidth + (bin_dist(0) * cosazplus)) *
zoomwidth) - vlft
            oldy2 = ((HalfHeight - (bin_dist(0) * sinazplus)) *
zoomheight) - vtop

            'Run inner loop and calculate vertices for remaining
points
            'Since all of our polygons are adjacent within the inner
loop, we only
            'need to calculate two new points each iteration and then
save them to
            'an variable to avoid the excess math calculations
            For j = 1 To 1800

                'get zoom/scale adjusted j from lookup table above
                jj = bin_dist(j)

                'Back left vertex
                radarPoly(0).x = oldx1
                radarPoly(0).y = oldy1

                'Back right vertex
                radarPoly(3).x = oldx2
                radarPoly(3).y = oldy2

                'Front left vertex then assign to Back left
                radarPoly(1).x = ((HalfWidth + (jj * cosaz)) *
zoomwidth) - vlft
                    oldx1 = radarPoly(1).x
                radarPoly(1).y = ((HalfHeight - (jj * sinaz)) *
zoomheight) - vtop
                    oldy1 = radarPoly(1).y

                'Front right vertex then assign to Back right
                radarPoly(2).x = ((HalfWidth + (jj * cosazplus)) *
zoomwidth) - vlft
                    oldx2 = radarPoly(2).x
                radarPoly(2).y = ((HalfHeight - (jj * sinazplus)) *
zoomheight) - vtop
                    oldy2 = radarPoly(2).y

                'We would then draw our polygon here, passing the
radarPoly() array to
                'Polygon API (or eventually OpenGL QUAD_STRIPS)

    Next j
  Next i

'Place a list box on the form to monitor how long it takes this
function to complete
Call List1.AddItem((timeGetTime - Start) / 1000, 0)

If List1.ListCount >= 100 Then List1.RemoveItem 99
'Calculate running average of the speed to account for fluctuations
Dim sum As Single
For i = 0 To List1.ListCount - 1
    sum = sum + CSng(List1.List(i))
Next i
    Me.Caption = sum / List1.ListCount

End Sub

Private Sub Form_Load()
'use API to set timer accuracy to 1 ms.
timeBeginPeriod 1
Me.Show
Me.Refresh

End Sub

Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
bUnload = True
End Sub

Private Sub Form_Unload(Cancel As Integer)
'use API to end timer accuracy of 1 ms.
timeEndPeriod 1
Timer1.Interval = 2000
Timer1.Enabled = False
Set Form1 = Nothing
End Sub

Private Sub Timer1_Timer()
If bUnload = True Then Timer1.Enabled = False
DrawPolygons
End Sub



Sat, 12 May 2012 03:23:14 GMT  
 Help Optimize Small Code Snippet
I got 0.445 in the IDE and .148 in the EXE(with default compilation
options), and .127 in the EXE with advanced optimization options selected,
including favor Pentium Pro. I have XP+SP2+Intel Core 2 Quad 2.4 GHz+FSB
1067 MHz+ 4GB DDR-800(timing 4-4-4-12). My result is the same as yours even
though I have the 4 cores version.

Some suggestions:

- Use Sin/Cos lookup tables.
- Use binary fractions and integer calculations.
- Use multi-threading to take advantage of multiple cores. See Olaf's site:

http://www.thecommon.net/3.html

Search google for the above terms for more details.



Sat, 12 May 2012 04:08:56 GMT  
 Help Optimize Small Code Snippet

Quote:

> I'm trying to optimize (for speed) the following code snippet, which
> prepares over a million small polygons for rendering. ...
> ... I was hoping that someone might see something
> obvious that may help bring this down. The outer loop with the trig
> functions actually runs quite fast (taking less than 1ms to complete
> for all iterations), so there has to be something with the inner loop
> that's causing the problem. My guess is that (at least partially) it's
> the Long assigment to the POINTAPI array (as required for the Polygon
> API) from the Single variables.

...

1) Part of the reason it's in the inner loop is the inner loop is 1800
iterations/outer loop!  (Stating the obvious... :) )

2) You could test the above hypothesis by simply removing the structure
reference for a test and the duplicate assignment and use a float for
the LHS target.  I don't otomh have a feel for how this might be for
efficiency in VB but it wouldn't take long at all to test and find out.

3) If that isn't it, it looks like you could also precompute the other
scaling factors inside the loop once before the inner loop rather than
repeating them for each outer iteration as well.

I don't know if there are any really good profiling tools for VB or not;
if there are, that would be a good place to start to actually determine
where, precisely, the OH is then attack that.

My first inclination would be to write the same function in fortran and
compare run time on the same platform; if I get a few minutes tonight
and have the ambition and you haven't gotten any resolution I might try
to see how the two compiled versions compare.  That's what I normally do
w/ my VB compute-bound apps--move the computations into a Fortran DLL.

--



Sat, 12 May 2012 05:58:43 GMT  
 Help Optimize Small Code Snippet

Quote:
> 1) Part of the reason it's in the inner loop is the inner loop is 1800
> iterations/outer loop! ?(Stating the obvious... :) )

Yes this is obvious, but there's no way to change that. Running the
outer loop 1800 times and the inner 720 times still results in the
same number of iterations (stating the obvious). The current method is
actually optimized because I can minimize the number of inefficient
trignometric calculations (the outer loop is an angle).

The answer to the problem was actually in "Nobody"'s response. Since
each (x,y) coordinate polygon bin has an equal spacing per the linear
calculation in j loop, I decided that rather than calculate each x/y
vertex using the above math, I instead calculated the first and last,
and interpolate over the interval.

Doing this much more simple math, and then using integer (long)
variables and math, resulted in a HUGE improvement. Basically I
multiplied everything by 1000 to eliminate the decimals, and then at
the end add 500 and integer redivide by 1000 to avoid rounding
problems.

The results are astounding. 0.38 seconds in the IDE (not much
improvement as I would have suspected), but .02 seconds in the
compiled version!
Furthermore, this calculation also allows me to only calculate those
polygons that need to be plotted, rather than every iteration of the i
& j loops. For a few examples I tested, the processing time is
something like .006 seconds.

Another interesting find: DO NOT use the VB Round() function -- ever.
Doing Round(x/) versus CLng(x+.5) or x + 500 \ 1000 (as in my case),
yields a 400-500% loss in speed!

wxforecaster



Mon, 14 May 2012 06:10:18 GMT  
 Help Optimize Small Code Snippet

Quote:
> Basically I multiplied everything by 1000 to eliminate the decimals, and
> then at
> the end add 500 and integer redivide by 1000 to avoid rounding
> problems.

<Quote fixed>

If you multiply or divide by 256 or (65536) or any power of 2, you can just
bit-shift to multiply or divide, which is faster. If you pick 256 or 65536,
you can just shift by a byte or two. I tested some situations and assembly
output to see how efficient the code that is generated by VB6:

Case 1: "Remove Integer Overflow Checks" is not selected(the default)

- Multiply("*"): VB would always use the slow "imul" to multiply regardless
if the number was a multiple of power of 2 or not. This instruction takes 13
to 37 clocks to execute if memory serves.
- Integer Division("\"): VB would do bit shifting if the number was a
multiple of power of 2. This takes one clock. If the number is not a
multiple of power of 2, a combination of bit shifting and "imul" are used,
which is slower in this case.

Case 2: "Remove Integer Overflow Checks" is selected:

- Multiply("*"): VB would do bit shifting if the number was a multiple of
power of 2. This takes one clock. If the number is not a multiple of power
of 2, a combination of bit shifting and occasionally "imul" are used.
- Integer Division("\"): Same as case 1.

There are some tricks used by assembly programmers to use bit-shifting for
certain numbers instead of multiply or divide. For example, in case of
number 10, you have this equation:

y = x * 10

They rewrite it as follows:

y = x * (2 + 8) ' 10 is broken down to its binary factors

y = 2x + 8x

Now this can be done in 3 steps:

1 - Bit shift for "2x", takes one clock.
2 - Bit shift for "8x", takes one clock.
3 - Add the result from steps 1 and 2, takes one clock.

If "imul" is used instead, it would take longer.

VB compiler seems to take these steps. In the case of 1000, it does the
following:

y = x * 1000
y = x * (8 * 125)
y = x * (8 * (5 * 5 * 5))
y = x * (8 * ((4 + 1) * (4 + 1) * (4 + 1)))

The assembly output is as follows:

36:       k = i * 1000
00401D30 8B 46 34             mov         eax,dword ptr [esi+34h]
00401D33 8D 04 80             lea         eax,[eax+eax*4]
00401D36 8D 04 80             lea         eax,[eax+eax*4]
00401D39 8D 14 80             lea         edx,[eax+eax*4]
00401D3C C1 E2 03             shl         edx,3
00401D3F 89 56 3C             mov         dword ptr [esi+3Ch],edx

These are 6 instructions. Here is what they do:

1 - Reads "i" value, and put it into eax register(32 bits).
2-4 Multiply by 5 each, using an assembly trick.
5 - Bit shift left 3 times, which is the equivalent to multiplying by 8. 2 ^
3 = 8.
6 - Save the result into "k" variable.

I don't know how long the above takes, but my guess is that it's 6 to 11
clocks.

If you use 1024 instead of 1000, here is the assembly output:

36:       k = i * 1024
00401D30 8B 56 34             mov         edx,dword ptr [esi+34h]
00401D33 C1 E2 0A             shl         edx,0Ah
00401D36 89 56 3C             mov         dword ptr [esi+3Ch],edx

These would take 3 to 5 clocks.

In case of integer division by 1000, VB did the following:

y = x / 1000
y = (x * 274877907) / (274877907000)
y = (x * 274877907) / (64 * 2^32)

I am not sure why VB does this, as there is a "idiv" instruction to use.
Enabling "Favor Pentium Pro" didn't make a difference.

Here is the assembly output:

36:       k = i \ 1000
00401D30 8B 4E 34             mov         ecx,dword ptr [esi+34h]
00401D33 B8 D3 4D 62 10       mov         eax,10624DD3h
00401D38 F7 E9                imul        ecx
00401D3A C1 FA 06             sar         edx,6
00401D3D 8B C2                mov         eax,edx
00401D3F C1 E8 1F             shr         eax,1Fh
00401D42 03 D0                add         edx,eax
00401D44 89 56 3C             mov         dword ptr [esi+3Ch],edx

These would take 20 to 50 clocks.

In case of integer division by 1024, here is the assembly output:

36:       k = i \ 1024
00401D30 8B 46 34             mov         eax,dword ptr [esi+34h]
00401D33 99                   cdq
00401D34 81 E2 FF 03 00 00    and         edx,3FFh
00401D3A 03 C2                add         eax,edx
00401D3C C1 F8 0A             sar         eax,0Ah
00401D3F 89 46 3C             mov         dword ptr [esi+3Ch],eax

These would take 6 to 8 clocks, so it's much faster to use powers of 2.



Mon, 14 May 2012 09:16:01 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Help with this code snippet please!

2. Help explaining a code snippet?

3. Help Optimizing Code

4. Code Database Program - Store your code Snippets

5. Help on Optimizing Code

6. Help Optimizing VB code

7. Help on Optimizing Code

8. Help needed to optimize this code

9. Help needed to optimize this code

10. Code Database Program- Store your code snippets

11. samples code snippets?

12. combobox: converting code snippet from VBA to VB

 

 
Powered by phpBB® Forum Software