Unity3D threads – measuring performance

Hey everybody!
Sometimes people ask me about delayed actions or threading in Unity3D and usually I suggest to use coroutines since they are suitable for the most cases I faced with.
But sometimes we need to use true threading and make some calculations faster, A* path finding for example.
So I decided to make a fast-written performance comparison of traditional code execution vs. threaded version. I searched for some simple threads managers and found this really simple Loom class from whydoidoit accidentally.

I did a simple test app for different platforms (I attached archive with apps and sources at the end of this post) and found some results pretty interesting.
All my code is trying to do – is just to make CPU think a little while working with huge array of Vector3D instances (10 000 000 for Desktop platforms and 1 000 000 for mobile platforms):

private void Run()
{
	const float scale = 432.654f;

	for (int j = 0; j < arrayLength; j++)
	{
		Vector3 v = vertices[j];

		v = Vector3.Lerp(v * scale / 123f * v.magnitude, v / scale * 0.0123f * v.magnitude, v.magnitude);

		vertices[j] = v;
	}
}

This is a simple dummy code as you can see.
I added simple Ant model (hello, Away3D examples authors! :)) with rotation at Update() to the scene in order to see how app can freeze while running this ugly code in main thread.

JTMLjQ0

I did few tests of this code as I mentioned previously, both in sync (straight execution in main thread) and async (running it the separate threads) modes, here are results I’ve got (S – sync, A – async):

PC (Intel Core i7 2600K 4.6Ghz, 4-core + Hyper Threading)

Standalone PC build (Fastest quality, 1024×768 windowed)
S: 920.8 ms
A: 198.2 ms (8 threads, x4.64)

WebPlayer build (Chrome 25.0.1364.172, Unity WebPlayer Release Channel Plugin v. 4.1.1f)
S: 938.1 ms
A: 200.4 ms (8 threads, x4.68)

PC (Intel Core i7 Q740 1.73Ghz, 4-core + Hyper Threading)

Standalone PC build (Fastest quality, 1024×768 windowed)
S: 1732.1 ms
A: 584.4 ms (8 threads, x2.96)

WebPlayer build (Chrome 25.0.1364.172, Unity WebPlayer Release Channel Plugin v. 4.1.1f)
S: 1780.0 ms
A: 555.3 ms (8 threads, x3.21)

PC (Intel Atom N455 1.66GHz 1-Core + Hyper Threading)

Standalone PC build (Fastest quality, 1024×600 windowed)
S: 7835.3 ms
A: 6533.9 ms (2 threads, x1.20)

WebPlayer build (Chrome 25.0.1364.172, Unity WebPlayer Release Channel Plugin v. 4.1.1f)
S: 7701.6 ms
A: 6778.3 ms (2 threads, x1.14)

Mobile

Galaxy Tab 10.1 (1.4GHz OC 2-core CPU, Tegra 2, Android 4.0.4, GT-P7510)
S: 1479.0 ms
A: 831.1 ms (2 threads, x1.78)

iPad 1 (1Ghz 1-core CPU)
S: 5364.0 ms
A: 6865.7 ms (1 thread, x0.78)

Pretty cool, huh? We have significant performance improvements (up to x4.68!) on the CPUs with 2 cores and more and overall performance improvement except the iPad1 test.
All tests in async mode on all tested platforms prevented app from freezing so we could see smoothly (or not, like in case of Atom) rotated mesh while our dummy code were executed in the separate thread(s). Expected, but very pleasantly!

Take a look at the two different generations of the Inter Core i7 CPUs – both have 4 cores + HT, but newer generation (2600K) allows using threads much more efficiently.
Tests on Intel Atom with 1 core + HT CPU were interesting – we run total 3 threads (1 main and 2 child dummy threads) there and we got small speed boost even with 1 core CPU. Looks like HT helped us there, nice!
I also noticed overall FPS dropped significantly on Atom still allowing interacting with app though.

Mobile tests are quite interesting as well.
Galaxy Tab 10.1 showed itself much faster in this tests comparing to the iPad. I guess 1.4GHz OC helped here though =P
iPad1 have 1 core CPU so it can’t handle 2 threads natively – that’s why we see running our code in separate thread is slower here. Actually we make CPU to work with 2 synthetic threads switching from one to another – from main, with our scene and rotating mesh to the child one – with our dummy calculations. Async code execution prevents app from freezing though, in some cases it would be more necessary than higher performance, you know (that’s where the Coroutines are usually can help BTW).

And we can see threading in Unity3D works on mobile devices like a charm, cool!

Let’s take a look at the overall timings comparison:

chart1
Not so huge overall difference… But with async code execution we have freeze-free apps on all platforms with a little performance boost, not so bad!

And to the 2+ cores CPU only selection:
chart2

Whoa! We see great performance leap here for the async code execution scheme. We definitely should use threads in the CPU-intensive algorithms to avoid app freezing and perform better on the 2+ core CPU configurations.

So if your

SystemInfo.processorCount

will return 2 or more – it’s a definitely way to go with async implementation of your CPU-intensive algorithms.

You could notice I didn’t tested threads in the Unity3D exported SWFs. The simple reason for this – Threading is not supported (yet?) by the Flash Exporter. I guess it’s a pretty complicated task since the only way to implement true threading in Flash is Workers usage. But workers approach differs from the Threading in .NET too much from my point so I guess we will see threads in Flash Exporter supported features list not so soon if ever.

Don’t forget additional threads will use multicore mobile devices batteries much more intensively leading to the short battery life – be careful with threads on mobile!
And as DbIMok commented in the Russian blog pos version: keep in mind – true threads can’t work with Unity3D engine objects (there are some exceptions though, like Debug.Log()), but coroutines can because they are utilising the “Green threads” concept – run code in chunks (every “frame”) within main thread.

I think that’s all for now, you could grab archive with sources and compiled app here (~27MB).
This is a LZMA2 compressed 7-zip archive. Use 7-zip 9.x or similar to unpack.

PS: And I’d be glad to hear from you any thoughts on this topic as well as any tests results on your hardware and devices!

Found a typo? Please, highlight it and press Shift + Enter or click here to inform me!

Share Button

Comments

Unity3D threads – measuring performance — 10 Comments

  1. Does using more processes increase the power consupmtion?
    I.e. do we need to think what we code or we really have one more option to speedup the app without drawbacks?

  2. forget it, I was opening the project from an external drive. the scripts didn’t compile. Got it working, going through them now!

  3. All the script files in the downloadable project are empty. Any clue as to why that is ?
    I’d like to confirm your numbers myself and add to them.

    Good work!

    • Hey, Coach! Thanks for your comment. I just downloaded project archive to ensure all scripts are fine there – and they are! Please, check your archiver capable to unpack the LZMA2 archive (or try to use latest 9.x 7-Zip alpha version)!

  4. Thanks for doing this. Well we should expect multiple cores to help with threading, but it’s good to see it done 🙂

    But what’s really cool is to see a single core with “HyperThreading” really does make a difference!

Feel free to print your thoughts on topic below

Your email address will not be published. Required fields are marked *

*