My approach was to precede code with a startClock(), and follow with a stopClock() routine, using clock() and gettimeofday() for measurement. I calculated the differences (e.g. microsecond math), stored in arrays, and aggregated totals. The rest of the effort was in formatting / displaying the results. The downside of this approach is it's knitted into sources fairly tightly. I saw big roadblocks in processing images, which I largely overcame after running these timings.
I was grabbing images from a webcam, analyzing them, and storing images to SD (and USB drives) before serving those up via NGINX (all on the Galileo.) It worked well, but saving files was a big delay / choke point. After running the profiling, and seeing big bottlenecks, I developed an "adaptive rate" approach - still able to analyze the full image, but saving compressed images (greatly reduced in size, but still recognizable) except when full image size was needed.
Dart is introducing Zones, which could be useful in profiling / performance management - anyone interested in working towards getting Dart running on the Galileo? Zones | Dart: Structured web apps