Of course.. let the machine do your job.

Compilers provide options to generate optimized for a given hardware (it optimizes using the full instruction set of that hardware). I have no idea abt the ARM processor you mentioned but check your compiler's manual if it provides any option for it.
Of course almost every compiler support the generic optimization options. So those you can anyway use (these are platform independent).
Finally just 2 words of caution:
1. When you use optimization options you loose debug information. Suggest you also test your application in optimized form (if you are doing it in debug only so far).
2. Regarding the "commonly used techniques", I suggest you first find out what has to be optimized using some tools like Quantify, there is no point in spending time/effort to optimize a function that's called only 10 times in an hour. Sometimes removing a temporary variable in a small function helps performance because it's called 10000 times. So if you're using the "commonly used techniques" use it where it's needed.
Other things you can do:
- data caching
- using unsigned vars,
- use inline functions (new compilers do this implicitly though)
- use registers.
- Finally I hope you've had a look at what everyone else has said in
this thread.