Rationale
Sampling tools like oprofile or dtrace’s profile provider don’t really provide methods to see what [multithreaded] programs are blocking on – only where they spend CPU time. Though there exist advanced techniques (such as systemtap and dtrace call level probes), it is overkill to build upon that. Poor man doesn’t have time. Poor man needs food.Method
For a poor developer to understand what a program is doing, he needs to see stacks. Once upon a time (back in Linux 2.4) there was a ‘pstack’ tool for that, Solaris has it too.Modern Linux systems though do not have such facilities, and one needs to improvise, like.. use debuggers – they can walk threads and provide stacks.
Technology
Getting stacks:gdb -ex "set pagination 0" -ex "thread apply all bt" \ --batch -p $(pidof mysqld)
Or for version-impaired (gdb 6.3 and older):
(echo "set pagination 0"; echo "thread apply all bt"; echo "quit"; cat /dev/zero ) | gdb -p $(pidof mysqld)
Collapsing traces (awk!):
BEGIN { s = ""; } /Thread/ { print s; s = ""; } /^\#/ { if (s != "" ) { s = s "," $4} else { s = $4 } } END { print s }
Full technology demonstration:
#!/bin/bash nsamples=1 sleeptime=0 pid=$(pidof mysqld) for x in $(seq 1 $nsamples) do gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p $pid sleep $sleeptime done | \ awk ' BEGIN { s = ""; } /Thread/ { print s; s = ""; } /^\#/ { if (s != "" ) { s = s "," $4} else { s = $4 } } END { print s }' | \ sort | uniq -c | sort -r -n -k 1,1
Output
291 pthread_cond_wait@@GLIBC_2.3.2,one_thread_per_connection_end,handle_one_connection 57 read,my_real_read,my_net_read,do_command,handle_one_connection,start_thread 26 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread 3 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_purge_worker_thread 1 select,os_thread_sleep,srv_purge_thread 1 select,os_thread_sleep,srv_master_thread 1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread 1 select,os_thread_sleep,srv_error_monitor_thread 1 select,handle_connections_sockets,main,select 1 read,vio_read_buff,my_real_read,my_net_read,cli_safe_read,handle_slave_io 1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innodb::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection 1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_index_entry,row_upd_step,row_update_for_mysql,ha_innodb::delete_row,handler::ha_delete_row,mysql_delete,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,apply_event_and_update_pos,handle_slave_sql 1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innodb::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection 1 do_sigwait,sigwait,signal_hand
Success stories and references
We hear this technology has been used by performance engineers at Google, Facebook, Wikipedia, Intel, Sun Microsystems and other places.- Baron Schwartz
- Mikael Ronstrom
- Mark Callaghan
Credits
Originally this technology was released as a collaborative effort by Mark Callaghan and Domas MituzasPoorMansProfiler on Facebook
No comments:
Post a Comment