Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
Some of the files were wrongly attributing code to some other
organizations and in few places proper attribution was missing.
This is mainly either a copy-paste error (when new file was
created from an existing one and header wasn't updated) or due
to some refactor which split non-original-BF code with purely
BF code.
Should solve some confusion around.