Narrowing the pool to a few good candidates and asking them to complete a more complex, realistic task tells you more than administering an easy test to lots of candidates.
Complex tests take more time to grade, however. I recommend having a few engineers review the code, noting where they would have done things differently. Keeping the anonymized test code around for future reference may be useful if you're using the same test on many engineers.
Generally, writing code that works is the most important metric to look at when grading. However, you should also pay attention to:
How well organized and readable the code is. Evaluating code this way will come naturally to your engineers; they should be able to easily follow the candidate's reasoning. Code should be organized into logical units (methods, functions, classes, objects, and so on). The candidate should additionally follow good naming conventions and avoid repeating code.
How well the candidate tests for edge cases and error...