So in order to profile for optimization to get the best performance, we want to take advantage of the most up-to-date vector instruction set, and for the KNL, that's the MIC-AVX512. And you can specify this in your compiler flags with -xMIC-AVX512. This is also available for the Xeons like Skylake, but in that case you would use -xCORE-AVX512. So for the remainder of the presentation I'm going to be talking about some of the tools and the procedures that we use specifically for this application, among which are: hot spot identifications, roofline plots, intra-node and inter-node scaling studies, the Application Performance Snapshot, Memory Access Analysis, Vectorization Advisor, Loop Analytics and the Vectorization Report.