diff options
author | Taylor Blau <me@ttaylorr.com> | 2022-10-31 04:04:42 +0300 |
---|---|---|
committer | Taylor Blau <me@ttaylorr.com> | 2022-10-31 04:04:42 +0300 |
commit | e5be3c632af4ea1ec6d9406d30f7f8cf54f5e9e7 (patch) | |
tree | 0aa932adf25a843b4d072d052f12e61f2b00d32f | |
parent | c112d8d9c2687cb6443b628e38aa446e335fb21a (diff) | |
parent | 81071626ba1ec54ad72de1e0a9a49c78eb87a2c8 (diff) |
Merge branch 'jh/trace2-timers-and-counters'
Two new facilities, "timer" and "counter", are introduced to the
trace2 API.
* jh/trace2-timers-and-counters:
trace2: add global counter mechanism
trace2: add stopwatch timers
trace2: convert ctx.thread_name from strbuf to pointer
trace2: improve thread-name documentation in the thread-context
trace2: rename the thread_name argument to trace2_thread_start
api-trace2.txt: elminate section describing the public trace2 API
tr2tls: clarify TLS terminology
trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
-rw-r--r-- | Documentation/technical/api-trace2.txt | 190 | ||||
-rw-r--r-- | Makefile | 2 | ||||
-rw-r--r-- | t/helper/test-trace2.c | 187 | ||||
-rwxr-xr-x | t/t0211-trace2-perf.sh | 95 | ||||
-rw-r--r-- | t/t0211/scrub_perf.perl | 6 | ||||
-rw-r--r-- | trace2.c | 121 | ||||
-rw-r--r-- | trace2.h | 101 | ||||
-rw-r--r-- | trace2/tr2_ctr.c | 101 | ||||
-rw-r--r-- | trace2/tr2_ctr.h | 104 | ||||
-rw-r--r-- | trace2/tr2_tgt.h | 16 | ||||
-rw-r--r-- | trace2/tr2_tgt_event.c | 47 | ||||
-rw-r--r-- | trace2/tr2_tgt_normal.c | 39 | ||||
-rw-r--r-- | trace2/tr2_tgt_perf.c | 43 | ||||
-rw-r--r-- | trace2/tr2_tls.c | 34 | ||||
-rw-r--r-- | trace2/tr2_tls.h | 55 | ||||
-rw-r--r-- | trace2/tr2_tmr.c | 182 | ||||
-rw-r--r-- | trace2/tr2_tmr.h | 140 |
17 files changed, 1361 insertions, 102 deletions
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index 2afa28bb5aa..de5fc250595 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -148,20 +148,18 @@ filename collisions). == Trace2 API -All public Trace2 functions and macros are defined in `trace2.h` and -`trace2.c`. All public symbols are prefixed with `trace2_`. +The Trace2 public API is defined and documented in `trace2.h`; refer to it for +more information. All public functions and macros are prefixed +with `trace2_` and are implemented in `trace2.c`. There are no public Trace2 data structures. The Trace2 code also defines a set of private functions and data types in the `trace2/` directory. These symbols are prefixed with `tr2_` -and should only be used by functions in `trace2.c`. +and should only be used by functions in `trace2.c` (or other private +source files in `trace2/`). -== Conventions for Public Functions and Macros - -The functions defined by the Trace2 API are declared and documented -in `trace2.h`. It defines the API functions and wrapper macros for -Trace2. +=== Conventions for Public Functions and Macros Some functions have a `_fl()` suffix to indicate that they take `file` and `line-number` arguments. @@ -172,52 +170,7 @@ take a `va_list` argument. Some functions have a `_printf_fl()` suffix to indicate that they also take a `printf()` style format with a variable number of arguments. -There are CPP wrapper macros and `#ifdef`s to hide most of these details. -See `trace2.h` for more details. The following discussion will only -describe the simplified forms. - -== Public API - -All Trace2 API functions send a message to all of the active -Trace2 Targets. This section describes the set of available -messages. - -It helps to divide these functions into groups for discussion -purposes. - -=== Basic Command Messages - -These are concerned with the lifetime of the overall git process. -e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`, -`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`. - -=== Command Detail Messages - -These are concerned with describing the specific Git command -after the command line, config, and environment are inspected. -e.g: `void trace2_cmd_name(const char *name)`, -`void trace2_cmd_mode(const char *mode)`. - -=== Child Process Messages - -These are concerned with the various spawned child processes, -including shell scripts, git commands, editors, pagers, and hooks. - -e.g: `void trace2_child_start(struct child_process *cmd)`. - -=== Git Thread Messages - -These messages are concerned with Git thread usage. - -e.g: `void trace2_thread_start(const char *thread_name)`. - -=== Region and Data Messages - -These are concerned with recording performance data -over regions or spans of code. e.g: -`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`. - -Refer to trace2.h for details about all trace2 functions. +CPP wrapper macros are defined to hide most of these details. == Trace2 Target Formats @@ -685,8 +638,8 @@ The "exec_id" field is a command-unique id and is only useful if the `"thread_start"`:: This event is generated when a thread is started. It is - generated from *within* the new thread's thread-proc (for TLS - reasons). + generated from *within* the new thread's thread-proc (because + it needs to access data in the thread's thread-local storage). + ------------ { @@ -698,7 +651,7 @@ The "exec_id" field is a command-unique id and is only useful if the `"thread_exit"`:: This event is generated when a thread exits. It is generated - from *within* the thread's thread-proc (for TLS reasons). + from *within* the thread's thread-proc. + ------------ { @@ -816,6 +769,73 @@ The "value" field may be an integer or a string. } ------------ +`"th_timer"`:: + This event logs the amount of time that a stopwatch timer was + running in the thread. This event is generated when a thread + exits for timers that requested per-thread events. ++ +------------ +{ + "event":"th_timer", + ... + "category":"my_category", + "name":"my_timer", + "intervals":5, # number of time it was started/stopped + "t_total":0.052741, # total time in seconds it was running + "t_min":0.010061, # shortest interval + "t_max":0.011648 # longest interval +} +------------ + +`"timer"`:: + This event logs the amount of time that a stopwatch timer was + running aggregated across all threads. This event is generated + when the process exits. ++ +------------ +{ + "event":"timer", + ... + "category":"my_category", + "name":"my_timer", + "intervals":5, # number of time it was started/stopped + "t_total":0.052741, # total time in seconds it was running + "t_min":0.010061, # shortest interval + "t_max":0.011648 # longest interval +} +------------ + +`"th_counter"`:: + This event logs the value of a counter variable in a thread. + This event is generated when a thread exits for counters that + requested per-thread events. ++ +------------ +{ + "event":"th_counter", + ... + "category":"my_category", + "name":"my_counter", + "count":23 +} +------------ + +`"counter"`:: + This event logs the value of a counter variable across all threads. + This event is generated when the process exits. The total value + reported here is the sum across all threads. ++ +------------ +{ + "event":"counter", + ... + "category":"my_category", + "name":"my_counter", + "count":23 +} +------------ + + == Example Trace2 API Usage Here is a hypothetical usage of the Trace2 API showing the intended @@ -1206,7 +1226,7 @@ worked on 508 items at offset 2032. Thread "th04" worked on 508 items at offset 508. + This example also shows that thread names are assigned in a racy manner -as each thread starts and allocates TLS storage. +as each thread starts. Config (def param) Events:: @@ -1247,6 +1267,60 @@ d0 | main | data | r0 | 0.002126 | 0.002126 | fsy d0 | main | exit | | 0.000470 | | | code:0 d0 | main | atexit | | 0.000477 | | | code:0 ---------------- + +Stopwatch Timer Events:: + + Measure the time spent in a function call or span of code + that might be called from many places within the code + throughout the life of the process. ++ +---------------- +static void expensive_function(void) +{ + trace2_timer_start(TRACE2_TIMER_ID_TEST1); + ... + sleep_millisec(1000); // Do something expensive + ... + trace2_timer_stop(TRACE2_TIMER_ID_TEST1); +} + +static int ut_100timer(int argc, const char **argv) +{ + ... + + expensive_function(); + + // Do something else 1... + + expensive_function(); + + // Do something else 2... + + expensive_function(); + + return 0; +} +---------------- ++ +In this example, we measure the total time spent in +`expensive_function()` regardless of when it is called +in the overall flow of the program. ++ +---------------- +$ export GIT_TRACE2_PERF_BRIEF=1 +$ export GIT_TRACE2_PERF=~/log.perf +$ t/helper/test-tool trace2 100timer 3 1000 +... +$ cat ~/log.perf +d0 | main | version | | | | | ... +d0 | main | start | | 0.001453 | | | t/helper/test-tool trace2 100timer 3 1000 +d0 | main | cmd_name | | | | | trace2 (trace2) +d0 | main | exit | | 3.003667 | | | code:0 +d0 | main | timer | | | | test | name:test1 intervals:3 total:3.001686 min:1.000254 max:1.000929 +d0 | main | atexit | | 3.003796 | | | code:0 +---------------- + + == Future Work === Relationship to the Existing Trace Api (api-trace.txt) @@ -1095,6 +1095,7 @@ LIB_OBJS += trace.o LIB_OBJS += trace2.o LIB_OBJS += trace2/tr2_cfg.o LIB_OBJS += trace2/tr2_cmd_name.o +LIB_OBJS += trace2/tr2_ctr.o LIB_OBJS += trace2/tr2_dst.o LIB_OBJS += trace2/tr2_sid.o LIB_OBJS += trace2/tr2_sysenv.o @@ -1103,6 +1104,7 @@ LIB_OBJS += trace2/tr2_tgt_event.o LIB_OBJS += trace2/tr2_tgt_normal.o LIB_OBJS += trace2/tr2_tgt_perf.o LIB_OBJS += trace2/tr2_tls.o +LIB_OBJS += trace2/tr2_tmr.o LIB_OBJS += trailer.o LIB_OBJS += transport-helper.o LIB_OBJS += transport.o diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c index a714130ece7..1b092c60714 100644 --- a/t/helper/test-trace2.c +++ b/t/helper/test-trace2.c @@ -229,6 +229,187 @@ static int ut_010bug_BUG(int argc, const char **argv) } /* + * Single-threaded timer test. Create several intervals using the + * TEST1 timer. The test script can verify that an aggregate Trace2 + * "timer" event is emitted indicating that we started+stopped the + * timer the requested number of times. + */ +static int ut_100timer(int argc, const char **argv) +{ + const char *usage_error = + "expect <count> <ms_delay>"; + + int count = 0; + int delay = 0; + int k; + + if (argc != 2) + die("%s", usage_error); + if (get_i(&count, argv[0])) + die("%s", usage_error); + if (get_i(&delay, argv[1])) + die("%s", usage_error); + + for (k = 0; k < count; k++) { + trace2_timer_start(TRACE2_TIMER_ID_TEST1); + sleep_millisec(delay); + trace2_timer_stop(TRACE2_TIMER_ID_TEST1); + } + + return 0; +} + +struct ut_101_data { + int count; + int delay; +}; + +static void *ut_101timer_thread_proc(void *_ut_101_data) +{ + struct ut_101_data *data = _ut_101_data; + int k; + + trace2_thread_start("ut_101"); + + for (k = 0; k < data->count; k++) { + trace2_timer_start(TRACE2_TIMER_ID_TEST2); + sleep_millisec(data->delay); + trace2_timer_stop(TRACE2_TIMER_ID_TEST2); + } + + trace2_thread_exit(); + return NULL; +} + +/* + * Multi-threaded timer test. Create several threads that each create + * several intervals using the TEST2 timer. The test script can verify + * that an individual Trace2 "th_timer" events for each thread and an + * aggregate "timer" event are generated. + */ +static int ut_101timer(int argc, const char **argv) +{ + const char *usage_error = + "expect <count> <ms_delay> <threads>"; + + struct ut_101_data data = { 0, 0 }; + int nr_threads = 0; + int k; + pthread_t *pids = NULL; + + if (argc != 3) + die("%s", usage_error); + if (get_i(&data.count, argv[0])) + die("%s", usage_error); + if (get_i(&data.delay, argv[1])) + die("%s", usage_error); + if (get_i(&nr_threads, argv[2])) + die("%s", usage_error); + + CALLOC_ARRAY(pids, nr_threads); + + for (k = 0; k < nr_threads; k++) { + if (pthread_create(&pids[k], NULL, ut_101timer_thread_proc, &data)) + die("failed to create thread[%d]", k); + } + + for (k = 0; k < nr_threads; k++) { + if (pthread_join(pids[k], NULL)) + die("failed to join thread[%d]", k); + } + + free(pids); + + return 0; +} + +/* + * Single-threaded counter test. Add several values to the TEST1 counter. + * The test script can verify that the final sum is reported in the "counter" + * event. + */ +static int ut_200counter(int argc, const char **argv) +{ + const char *usage_error = + "expect <v1> [<v2> [...]]"; + int value; + int k; + + if (argc < 1) + die("%s", usage_error); + + for (k = 0; k < argc; k++) { + if (get_i(&value, argv[k])) + die("invalid value[%s] -- %s", + argv[k], usage_error); + trace2_counter_add(TRACE2_COUNTER_ID_TEST1, value); + } + + return 0; +} + +/* + * Multi-threaded counter test. Create seveal threads that each increment + * the TEST2 global counter. The test script can verify that an individual + * "th_counter" event is generated with a partial sum for each thread and + * that a final aggregate "counter" event is generated. + */ + +struct ut_201_data { + int v1; + int v2; +}; + +static void *ut_201counter_thread_proc(void *_ut_201_data) +{ + struct ut_201_data *data = _ut_201_data; + + trace2_thread_start("ut_201"); + + trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v1); + trace2_counter_add(TRACE2_COUNTER_ID_TEST2, data->v2); + + trace2_thread_exit(); + return NULL; +} + +static int ut_201counter(int argc, const char **argv) +{ + const char *usage_error = + "expect <v1> <v2> <threads>"; + + struct ut_201_data data = { 0, 0 }; + int nr_threads = 0; + int k; + pthread_t *pids = NULL; + + if (argc != 3) + die("%s", usage_error); + if (get_i(&data.v1, argv[0])) + die("%s", usage_error); + if (get_i(&data.v2, argv[1])) + die("%s", usage_error); + if (get_i(&nr_threads, argv[2])) + die("%s", usage_error); + + CALLOC_ARRAY(pids, nr_threads); + + for (k = 0; k < nr_threads; k++) { + if (pthread_create(&pids[k], NULL, ut_201counter_thread_proc, &data)) + die("failed to create thread[%d]", k); + } + + for (k = 0; k < nr_threads; k++) { + if (pthread_join(pids[k], NULL)) + die("failed to join thread[%d]", k); + } + + free(pids); + + return 0; +} + +/* * Usage: * test-tool trace2 <ut_name_1> <ut_usage_1> * test-tool trace2 <ut_name_2> <ut_usage_2> @@ -248,6 +429,12 @@ static struct unit_test ut_table[] = { { ut_008bug, "008bug", "" }, { ut_009bug_BUG, "009bug_BUG","" }, { ut_010bug_BUG, "010bug_BUG","" }, + + { ut_100timer, "100timer", "<count> <ms_delay>" }, + { ut_101timer, "101timer", "<count> <ms_delay> <threads>" }, + + { ut_200counter, "200counter", "<v1> [<v2> [<v3> [...]]]" }, + { ut_201counter, "201counter", "<v1> <v2> <threads>" }, }; /* clang-format on */ diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh index 22d0845544e..0b3436e8cac 100755 --- a/t/t0211-trace2-perf.sh +++ b/t/t0211-trace2-perf.sh @@ -173,4 +173,99 @@ test_expect_success 'using global config, perf stream, return code 0' ' test_cmp expect actual ' +# Exercise the stopwatch timers in a loop and confirm that we have +# as many start/stop intervals as expected. We cannot really test the +# actual (total, min, max) timer values, so we have to assume that they +# are good, but we can verify the interval count. +# +# The timer "test/test1" should only emit a global summary "timer" event. +# The timer "test/test2" should emit per-thread "th_timer" events and a +# global summary "timer" event. + +have_timer_event () { + thread=$1 event=$2 category=$3 name=$4 intervals=$5 file=$6 && + + pattern="d0|${thread}|${event}||||${category}|name:${name} intervals:${intervals}" && + + grep "${pattern}" ${file} +} + +test_expect_success 'stopwatch timer test/test1' ' + test_when_finished "rm trace.perf actual" && + test_config_global trace2.perfBrief 1 && + test_config_global trace2.perfTarget "$(pwd)/trace.perf" && + + # Use the timer "test1" 5 times from "main". + test-tool trace2 100timer 5 10 && + + perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual && + + have_timer_event "main" "timer" "test" "test1" 5 actual +' + +test_expect_success 'stopwatch timer test/test2' ' + test_when_finished "rm trace.perf actual" && + test_config_global trace2.perfBrief 1 && + test_config_global trace2.perfTarget "$(pwd)/trace.perf" && + + # Use the timer "test2" 5 times each in 3 threads. + test-tool trace2 101timer 5 10 3 && + + perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual && + + # So we should have 3 per-thread events of 5 each. + have_timer_event "th01:ut_101" "th_timer" "test" "test2" 5 actual && + have_timer_event "th02:ut_101" "th_timer" "test" "test2" 5 actual && + have_timer_event "th03:ut_101" "th_timer" "test" "test2" 5 actual && + + # And we should have 15 total uses. + have_timer_event "main" "timer" "test" "test2" 15 actual +' + +# Exercise the global counters and confirm that we get the expected values. +# +# The counter "test/test1" should only emit a global summary "counter" event. +# The counter "test/test2" could emit per-thread "th_counter" events and a +# global summary "counter" event. + +have_counter_event () { + thread=$1 event=$2 category=$3 name=$4 value=$5 file=$6 && + + pattern="d0|${thread}|${event}||||${category}|name:${name} value:${value}" && + + grep "${patern}" ${file} +} + +test_expect_success 'global counter test/test1' ' + test_when_finished "rm trace.perf actual" && + test_config_global trace2.perfBrief 1 && + test_config_global trace2.perfTarget "$(pwd)/trace.perf" && + + # Use the counter "test1" and add n integers. + test-tool trace2 200counter 1 2 3 4 5 && + + perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual && + + have_counter_event "main" "counter" "test" "test1" 15 actual +' + +test_expect_success 'global counter test/test2' ' + test_when_finished "rm trace.perf actual" && + test_config_global trace2.perfBrief 1 && + test_config_global trace2.perfTarget "$(pwd)/trace.perf" && + + # Add 2 integers to the counter "test2" in each of 3 threads. + test-tool trace2 201counter 7 13 3 && + + perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <trace.perf >actual && + + # So we should have 3 per-thread events of 5 each. + have_counter_event "th01:ut_201" "th_counter" "test" "test2" 20 actual && + have_counter_event "th02:ut_201" "th_counter" "test" "test2" 20 actual && + have_counter_event "th03:ut_201" "th_counter" "test" "test2" 20 actual && + + # And we should have a single event with the total across all threads. + have_counter_event "main" "counter" "test" "test2" 60 actual +' + test_done diff --git a/t/t0211/scrub_perf.perl b/t/t0211/scrub_perf.perl index 299999f0f89..7a50bae6463 100644 --- a/t/t0211/scrub_perf.perl +++ b/t/t0211/scrub_perf.perl @@ -64,6 +64,12 @@ while (<>) { goto SKIP_LINE; } } + elsif ($tokens[$col_event] =~ m/timer/) { + # This also captures "th_timer" events + $tokens[$col_rest] =~ s/ total:\d+\.\d*/ total:_T_TOTAL_/; + $tokens[$col_rest] =~ s/ min:\d+\.\d*/ min:_T_MIN_/; + $tokens[$col_rest] =~ s/ max:\d+\.\d*/ max:_T_MAX_/; + } # t_abs and t_rel are either blank or a float. Replace the float # with a constant for matching the HEREDOC in the test script. @@ -8,11 +8,13 @@ #include "version.h" #include "trace2/tr2_cfg.h" #include "trace2/tr2_cmd_name.h" +#include "trace2/tr2_ctr.h" #include "trace2/tr2_dst.h" #include "trace2/tr2_sid.h" #include "trace2/tr2_sysenv.h" #include "trace2/tr2_tgt.h" #include "trace2/tr2_tls.h" +#include "trace2/tr2_tmr.h" static int trace2_enabled; @@ -52,7 +54,7 @@ static struct tr2_tgt *tr2_tgt_builtins[] = * Force (rather than lazily) initialize any of the requested * builtin TRACE2 targets at startup (and before we've seen an * actual TRACE2 event call) so we can see if we need to setup - * the TR2 and TLS machinery. + * private data structures and thread-local storage. * * Return the number of builtin targets enabled. */ @@ -83,6 +85,39 @@ static void tr2_tgt_disable_builtins(void) tgt_j->pfn_term(); } +/* + * The signature of this function must match the pfn_timer + * method in the targets. (Think of this is an apply operation + * across the set of active targets.) + */ +static void tr2_tgt_emit_a_timer(const struct tr2_timer_metadata *meta, + const struct tr2_timer *timer, + int is_final_data) +{ + struct tr2_tgt *tgt_j; + int j; + + for_each_wanted_builtin (j, tgt_j) + if (tgt_j->pfn_timer) + tgt_j->pfn_timer(meta, timer, is_final_data); +} + +/* + * The signature of this function must match the pfn_counter + * method in the targets. + */ +static void tr2_tgt_emit_a_counter(const struct tr2_counter_metadata *meta, + const struct tr2_counter *counter, + int is_final_data) +{ + struct tr2_tgt *tgt_j; + int j; + + for_each_wanted_builtin (j, tgt_j) + if (tgt_j->pfn_counter) + tgt_j->pfn_counter(meta, counter, is_final_data); +} + static int tr2main_exit_code; /* @@ -110,6 +145,32 @@ static void tr2main_atexit_handler(void) */ tr2tls_pop_unwind_self(); + /* + * Some timers want per-thread details. If the main thread + * used one of those timers, emit the details now (before + * we emit the aggregate timer values). + * + * Likewise for counters. + */ + tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer); + tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter); + + /* + * Add stopwatch timer and counter data for the main thread to + * the final totals. And then emit the final values. + * + * Technically, we shouldn't need to hold the lock to update + * and output the final_timer_block and final_counter_block + * (since all other threads should be dead by now), but it + * doesn't hurt anything. + */ + tr2tls_lock(); + tr2_update_final_timers(); + tr2_update_final_counters(); + tr2_emit_final_timers(tr2_tgt_emit_a_timer); + tr2_emit_final_counters(tr2_tgt_emit_a_counter); + tr2tls_unlock(); + for_each_wanted_builtin (j, tgt_j) if (tgt_j->pfn_atexit) tgt_j->pfn_atexit(us_elapsed_absolute, @@ -466,7 +527,7 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code) file, line, us_elapsed_absolute, exec_id, code); } -void trace2_thread_start_fl(const char *file, int line, const char *thread_name) +void trace2_thread_start_fl(const char *file, int line, const char *thread_base_name) { struct tr2_tgt *tgt_j; int j; @@ -488,14 +549,14 @@ void trace2_thread_start_fl(const char *file, int line, const char *thread_name) */ trace2_region_enter_printf_fl(file, line, NULL, NULL, NULL, "thread-proc on main: %s", - thread_name); + thread_base_name); return; } us_now = getnanotime() / 1000; us_elapsed_absolute = tr2tls_absolute_elapsed(us_now); - tr2tls_create_self(thread_name, us_now); + tr2tls_create_self(thread_base_name, us_now); for_each_wanted_builtin (j, tgt_j) if (tgt_j->pfn_thread_start_fl) @@ -541,6 +602,25 @@ void trace2_thread_exit_fl(const char *file, int line) tr2tls_pop_unwind_self(); us_elapsed_thread = tr2tls_region_elasped_self(us_now); + /* + * Some timers want per-thread details. If this thread used + * one of those timers, emit the details now. + * + * Likewise for counters. + */ + tr2_emit_per_thread_timers(tr2_tgt_emit_a_timer); + tr2_emit_per_thread_counters(tr2_tgt_emit_a_counter); + + /* + * Add stopwatch timer and counter data from the current + * (non-main) thread to the final totals. (We'll accumulate + * data for the main thread later during "atexit".) + */ + tr2tls_lock(); + tr2_update_final_timers(); + tr2_update_final_counters(); + tr2tls_unlock(); + for_each_wanted_builtin (j, tgt_j) if (tgt_j->pfn_thread_exit_fl) tgt_j->pfn_thread_exit_fl(file, line, @@ -795,6 +875,39 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...) va_end(ap); } +void trace2_timer_start(enum trace2_timer_id tid) +{ + if (!trace2_enabled) + return; + + if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS) + BUG("trace2_timer_start: invalid timer id: %d", tid); + + tr2_start_timer(tid); +} + +void trace2_timer_stop(enum trace2_timer_id tid) +{ + if (!trace2_enabled) + return; + + if (tid < 0 || tid >= TRACE2_NUMBER_OF_TIMERS) + BUG("trace2_timer_stop: invalid timer id: %d", tid); + + tr2_stop_timer(tid); +} + +void trace2_counter_add(enum trace2_counter_id cid, uint64_t value) +{ + if (!trace2_enabled) + return; + + if (cid < 0 || cid >= TRACE2_NUMBER_OF_COUNTERS) + BUG("trace2_counter_add: invalid counter id: %d", cid); + + tr2_counter_increment(cid, value); +} + const char *trace2_session_id(void) { return tr2_sid_get(); @@ -51,6 +51,8 @@ struct json_writer; * [] trace2_region* -- emit region nesting messages. * [] trace2_data* -- emit region/thread/repo data messages. * [] trace2_printf* -- legacy trace[1] messages. + * [] trace2_timer* -- stopwatch timers (messages are deferred). + * [] trace2_counter* -- global counters (messages are deferred). */ /* @@ -73,8 +75,7 @@ void trace2_initialize_clock(void); /* * Initialize TRACE2 tracing facility if any of the builtin TRACE2 * targets are enabled in the system config or the environment. - * This includes setting up the Trace2 thread local storage (TLS). - * Emits a 'version' message containing the version of git + * This emits a 'version' message containing the version of git * and the Trace2 protocol. * * This function should be called from `main()` as early as possible in @@ -302,21 +303,23 @@ void trace2_exec_result_fl(const char *file, int line, int exec_id, int code); /* * Emit a 'thread_start' event. This must be called from inside the - * thread-proc to set up the trace2 TLS data for the thread. + * thread-proc to allow the thread to create its own thread-local + * storage. * - * Thread names should be descriptive, like "preload_index". - * Thread names will be decorated with an instance number automatically. + * The thread base name should be descriptive, like "preload_index" or + * taken from the thread-proc function. A unique thread name will be + * created from the given base name and the thread id automatically. */ void trace2_thread_start_fl(const char *file, int line, - const char *thread_name); + const char *thread_base_name); -#define trace2_thread_start(thread_name) \ - trace2_thread_start_fl(__FILE__, __LINE__, (thread_name)) +#define trace2_thread_start(thread_base_name) \ + trace2_thread_start_fl(__FILE__, __LINE__, (thread_base_name)) /* * Emit a 'thread_exit' event. This must be called from inside the - * thread-proc to report thread-specific data and cleanup TLS data - * for the thread. + * thread-proc so that the thread can access and clean up its + * thread-local storage. */ void trace2_thread_exit_fl(const char *file, int line); @@ -485,6 +488,84 @@ void trace2_printf_fl(const char *file, int line, const char *fmt, ...); #define trace2_printf(...) trace2_printf_fl(__FILE__, __LINE__, __VA_ARGS__) /* + * Define the set of stopwatch timers. + * + * We can add more at any time, but they must be defined at compile + * time (to avoid the need to dynamically allocate and synchronize + * them between different threads). + * + * These must start at 0 and be contiguous (because we use them + * elsewhere as array indexes). + * + * Any values added to this enum must also be added to the + * `tr2_timer_metadata[]` in `trace2/tr2_tmr.c`. + */ +enum trace2_timer_id { + /* + * Define two timers for testing. See `t/helper/test-trace2.c`. + * These can be used for ad hoc testing, but should not be used + * for permanent analysis code. + */ + TRACE2_TIMER_ID_TEST1 = 0, /* emits summary event only */ + TRACE2_TIMER_ID_TEST2, /* emits summary and thread events */ + + /* Add additional timer definitions before here. */ + TRACE2_NUMBER_OF_TIMERS +}; + +/* + * Start/Stop the indicated stopwatch timer in the current thread. + * + * The time spent by the current thread between the _start and _stop + * calls will be added to the thread's partial sum for this timer. + * + * Timer events are emitted at thread and program exit. + * + * Note: Since the stopwatch API routines do not generate individual + * events, they do not take (file, line) arguments. Similarly, the + * category and timer name values are defined at compile-time in the + * timer definitions array, so they are not needed here in the API. + */ +void trace2_timer_start(enum trace2_timer_id tid); +void trace2_timer_stop(enum trace2_timer_id tid); + +/* + * Define the set of global counters. + * + * We can add more at any time, but they must be defined at compile + * time (to avoid the need to dynamically allocate and synchronize + * them between different threads). + * + * These must start at 0 and be contiguous (because we use them + * elsewhere as array indexes). + * + * Any values added to this enum be also be added to the + * `tr2_counter_metadata[]` in `trace2/tr2_tr2_ctr.c`. + */ +enum trace2_counter_id { + /* + * Define two counters for testing. See `t/helper/test-trace2.c`. + * These can be used for ad hoc testing, but should not be used + * for permanent analysis code. + */ + TRACE2_COUNTER_ID_TEST1 = 0, /* emits summary event only */ + TRACE2_COUNTER_ID_TEST2, /* emits summary and thread events */ + + /* Add additional counter definitions before here. */ + TRACE2_NUMBER_OF_COUNTERS +}; + +/* + * Increase the named global counter by value. + * + * Note that this adds `value` to the current thread's partial sum for + * this counter (without locking) and that the complete sum is not + * available until all threads have exited, so it does not return the + * new value of the counter. + */ +void trace2_counter_add(enum trace2_counter_id cid, uint64_t value); + +/* * Optional platform-specific code to dump information about the * current and any parent process(es). This is intended to allow * post-processors to know who spawned this git instance and anything diff --git a/trace2/tr2_ctr.c b/trace2/tr2_ctr.c new file mode 100644 index 00000000000..483ca7c308f --- /dev/null +++ b/trace2/tr2_ctr.c @@ -0,0 +1,101 @@ +#include "cache.h" +#include "thread-utils.h" +#include "trace2/tr2_tgt.h" +#include "trace2/tr2_tls.h" +#include "trace2/tr2_ctr.h" + +/* + * A global counter block to aggregrate values from the partial sums + * from each thread. + */ +static struct tr2_counter_block final_counter_block; /* access under tr2tls_mutex */ + +/* + * Define metadata for each global counter. + * + * This array must match the "enum trace2_counter_id" and the values + * in "struct tr2_counter_block.counter[*]". + */ +static struct tr2_counter_metadata tr2_counter_metadata[TRACE2_NUMBER_OF_COUNTERS] = { + [TRACE2_COUNTER_ID_TEST1] = { + .category = "test", + .name = "test1", + .want_per_thread_events = 0, + }, + [TRACE2_COUNTER_ID_TEST2] = { + .category = "test", + .name = "test2", + .want_per_thread_events = 1, + }, + + /* Add additional metadata before here. */ +}; + +void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + struct tr2_counter *c = &ctx->counter_block.counter[cid]; + + c->value += value; + + ctx->used_any_counter = 1; + if (tr2_counter_metadata[cid].want_per_thread_events) + ctx->used_any_per_thread_counter = 1; +} + +void tr2_update_final_counters(void) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + enum trace2_counter_id cid; + + if (!ctx->used_any_counter) + return; + + /* + * Access `final_counter_block` requires holding `tr2tls_mutex`. + * We assume that our caller is holding the lock. + */ + + for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) { + struct tr2_counter *c_final = &final_counter_block.counter[cid]; + const struct tr2_counter *c = &ctx->counter_block.counter[cid]; + + c_final->value += c->value; + } +} + +void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + enum trace2_counter_id cid; + + if (!ctx->used_any_per_thread_counter) + return; + + /* + * For each counter, if the counter wants per-thread events + * and this thread used it (the value is non-zero), emit it. + */ + for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) + if (tr2_counter_metadata[cid].want_per_thread_events && + ctx->counter_block.counter[cid].value) + fn_apply(&tr2_counter_metadata[cid], + &ctx->counter_block.counter[cid], + 0); +} + +void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply) +{ + enum trace2_counter_id cid; + + /* + * Access `final_counter_block` requires holding `tr2tls_mutex`. + * We assume that our caller is holding the lock. + */ + + for (cid = 0; cid < TRACE2_NUMBER_OF_COUNTERS; cid++) + if (final_counter_block.counter[cid].value) + fn_apply(&tr2_counter_metadata[cid], + &final_counter_block.counter[cid], + 1); +} diff --git a/trace2/tr2_ctr.h b/trace2/tr2_ctr.h new file mode 100644 index 00000000000..a2267ee9901 --- /dev/null +++ b/trace2/tr2_ctr.h @@ -0,0 +1,104 @@ +#ifndef TR2_CTR_H +#define TR2_CTR_H + +#include "trace2.h" +#include "trace2/tr2_tgt.h" + +/* + * Define a mechanism to allow global "counters". + * + * Counters can be used count interesting activity that does not fit + * the "region and data" model, such as code called from many + * different regions and/or where you want to count a number of items, + * but don't have control of when the last item will be processed, + * such as counter the number of calls to `lstat()`. + * + * Counters differ from Trace2 "data" events. Data events are emitted + * immediately and are appropriate for documenting loop counters at + * the end of a region, for example. Counter values are accumulated + * during the program and final counter values are emitted at program + * exit. + * + * To make this model efficient, we define a compile-time fixed set of + * counters and counter ids using a fixed size "counter block" array + * in thread-local storage. This gives us constant time, lock-free + * access to each counter within each thread. This lets us avoid the + * complexities of dynamically allocating a counter and sharing that + * definition with other threads. + * + * Each thread uses the counter block in its thread-local storage to + * increment partial sums for each counter (without locking). When a + * thread exits, those partial sums are (under lock) added to the + * global final sum. + * + * Partial sums for each counter are optionally emitted when a thread + * exits. + * + * Final sums for each counter are emitted between the "exit" and + * "atexit" events. + * + * A parallel "counter metadata" table contains the "category" and + * "name" fields for each counter. This eliminates the need to + * include those args in the various counter APIs. + */ + +/* + * The definition of an individual counter as used by an individual + * thread (and later in aggregation). + */ +struct tr2_counter { + uint64_t value; +}; + +/* + * Metadata for a counter. + */ +struct tr2_counter_metadata { + const char *category; + const char *name; + + /* + * True if we should emit per-thread events for this counter + * when individual threads exit. + */ + unsigned int want_per_thread_events:1; +}; + +/* + * A compile-time fixed block of counters to insert into thread-local + * storage. This wrapper is used to avoid quirks of C and the usual + * need to pass an array size argument. + */ +struct tr2_counter_block { + struct tr2_counter counter[TRACE2_NUMBER_OF_COUNTERS]; +}; + +/* + * Private routines used by trace2.c to increment a counter for the + * current thread. + */ +void tr2_counter_increment(enum trace2_counter_id cid, uint64_t value); + +/* + * Add the current thread's counter data to the global totals. + * This is called during thread-exit. + * + * Caller must be holding the tr2tls_mutex. + */ +void tr2_update_final_counters(void); + +/* + * Emit per-thread counter data for the current thread. + * This is called during thread-exit. + */ +void tr2_emit_per_thread_counters(tr2_tgt_evt_counter_t *fn_apply); + +/* + * Emit global counter values. + * This is called during atexit handling. + * + * Caller must be holding the tr2tls_mutex. + */ +void tr2_emit_final_counters(tr2_tgt_evt_counter_t *fn_apply); + +#endif /* TR2_CTR_H */ diff --git a/trace2/tr2_tgt.h b/trace2/tr2_tgt.h index 65f94e15748..bf8745c4f05 100644 --- a/trace2/tr2_tgt.h +++ b/trace2/tr2_tgt.h @@ -4,6 +4,12 @@ struct child_process; struct repository; struct json_writer; +struct tr2_timer_metadata; +struct tr2_timer; +struct tr2_counter_metadata; +struct tr2_counter; + +#define NS_TO_SEC(ns) ((double)(ns) / 1.0e9) /* * Function prototypes for a TRACE2 "target" vtable. @@ -96,6 +102,14 @@ typedef void(tr2_tgt_evt_printf_va_fl_t)(const char *file, int line, uint64_t us_elapsed_absolute, const char *fmt, va_list ap); +typedef void(tr2_tgt_evt_timer_t)(const struct tr2_timer_metadata *meta, + const struct tr2_timer *timer, + int is_final_data); + +typedef void(tr2_tgt_evt_counter_t)(const struct tr2_counter_metadata *meta, + const struct tr2_counter *counter, + int is_final_data); + /* * "vtable" for a TRACE2 target. Use NULL if a target does not want * to emit that message. @@ -132,6 +146,8 @@ struct tr2_tgt { tr2_tgt_evt_data_fl_t *pfn_data_fl; tr2_tgt_evt_data_json_fl_t *pfn_data_json_fl; tr2_tgt_evt_printf_va_fl_t *pfn_printf_va_fl; + tr2_tgt_evt_timer_t *pfn_timer; + tr2_tgt_evt_counter_t *pfn_counter; }; /* clang-format on */ diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c index 37a3163be12..16f6332755e 100644 --- a/trace2/tr2_tgt_event.c +++ b/trace2/tr2_tgt_event.c @@ -9,6 +9,7 @@ #include "trace2/tr2_sysenv.h" #include "trace2/tr2_tgt.h" #include "trace2/tr2_tls.h" +#include "trace2/tr2_tmr.h" static struct tr2_dst tr2dst_event = { .sysenv_var = TR2_SYSENV_EVENT, @@ -90,7 +91,7 @@ static void event_fmt_prepare(const char *event_name, const char *file, jw_object_string(jw, "event", event_name); jw_object_string(jw, "sid", tr2_sid_get()); - jw_object_string(jw, "thread", ctx->thread_name.buf); + jw_object_string(jw, "thread", ctx->thread_name); /* * In brief mode, only emit <time> on these 2 event types. @@ -617,6 +618,48 @@ static void fn_data_json_fl(const char *file, int line, } } +static void fn_timer(const struct tr2_timer_metadata *meta, + const struct tr2_timer *timer, + int is_final_data) +{ + const char *event_name = is_final_data ? "timer" : "th_timer"; + struct json_writer jw = JSON_WRITER_INIT; + double t_total = NS_TO_SEC(timer->total_ns); + double t_min = NS_TO_SEC(timer->min_ns); + double t_max = NS_TO_SEC(timer->max_ns); + + jw_object_begin(&jw, 0); + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw); + jw_object_string(&jw, "category", meta->category); + jw_object_string(&jw, "name", meta->name); + jw_object_intmax(&jw, "intervals", timer->interval_count); + jw_object_double(&jw, "t_total", 6, t_total); + jw_object_double(&jw, "t_min", 6, t_min); + jw_object_double(&jw, "t_max", 6, t_max); + jw_end(&jw); + + tr2_dst_write_line(&tr2dst_event, &jw.json); + jw_release(&jw); +} + +static void fn_counter(const struct tr2_counter_metadata *meta, + const struct tr2_counter *counter, + int is_final_data) +{ + const char *event_name = is_final_data ? "counter" : "th_counter"; + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + event_fmt_prepare(event_name, __FILE__, __LINE__, NULL, &jw); + jw_object_string(&jw, "category", meta->category); + jw_object_string(&jw, "name", meta->name); + jw_object_intmax(&jw, "count", counter->value); + jw_end(&jw); + + tr2_dst_write_line(&tr2dst_event, &jw.json); + jw_release(&jw); +} + struct tr2_tgt tr2_tgt_event = { .pdst = &tr2dst_event, @@ -648,4 +691,6 @@ struct tr2_tgt tr2_tgt_event = { .pfn_data_fl = fn_data_fl, .pfn_data_json_fl = fn_data_json_fl, .pfn_printf_va_fl = NULL, + .pfn_timer = fn_timer, + .pfn_counter = fn_counter, }; diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c index 69f80330778..fbbef68dfc0 100644 --- a/trace2/tr2_tgt_normal.c +++ b/trace2/tr2_tgt_normal.c @@ -8,6 +8,7 @@ #include "trace2/tr2_tbuf.h" #include "trace2/tr2_tgt.h" #include "trace2/tr2_tls.h" +#include "trace2/tr2_tmr.h" static struct tr2_dst tr2dst_normal = { .sysenv_var = TR2_SYSENV_NORMAL, @@ -329,6 +330,42 @@ static void fn_printf_va_fl(const char *file, int line, strbuf_release(&buf_payload); } +static void fn_timer(const struct tr2_timer_metadata *meta, + const struct tr2_timer *timer, + int is_final_data) +{ + const char *event_name = is_final_data ? "timer" : "th_timer"; + struct strbuf buf_payload = STRBUF_INIT; + double t_total = NS_TO_SEC(timer->total_ns); + double t_min = NS_TO_SEC(timer->min_ns); + double t_max = NS_TO_SEC(timer->max_ns); + + strbuf_addf(&buf_payload, ("%s %s/%s" + " intervals:%"PRIu64 + " total:%8.6f min:%8.6f max:%8.6f"), + event_name, meta->category, meta->name, + timer->interval_count, + t_total, t_min, t_max); + + normal_io_write_fl(__FILE__, __LINE__, &buf_payload); + strbuf_release(&buf_payload); +} + +static void fn_counter(const struct tr2_counter_metadata *meta, + const struct tr2_counter *counter, + int is_final_data) +{ + const char *event_name = is_final_data ? "counter" : "th_counter"; + struct strbuf buf_payload = STRBUF_INIT; + + strbuf_addf(&buf_payload, "%s %s/%s value:%"PRIu64, + event_name, meta->category, meta->name, + counter->value); + + normal_io_write_fl(__FILE__, __LINE__, &buf_payload); + strbuf_release(&buf_payload); +} + struct tr2_tgt tr2_tgt_normal = { .pdst = &tr2dst_normal, @@ -360,4 +397,6 @@ struct tr2_tgt tr2_tgt_normal = { .pfn_data_fl = NULL, .pfn_data_json_fl = NULL, .pfn_printf_va_fl = fn_printf_va_fl, + .pfn_timer = fn_timer, + .pfn_counter = fn_counter, }; diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c index 8cb792488c8..adae8032639 100644 --- a/trace2/tr2_tgt_perf.c +++ b/trace2/tr2_tgt_perf.c @@ -10,6 +10,7 @@ #include "trace2/tr2_tbuf.h" #include "trace2/tr2_tgt.h" #include "trace2/tr2_tls.h" +#include "trace2/tr2_tmr.h" static struct tr2_dst tr2dst_perf = { .sysenv_var = TR2_SYSENV_PERF, @@ -108,7 +109,7 @@ static void perf_fmt_prepare(const char *event_name, strbuf_addf(buf, "d%d | ", tr2_sid_depth()); strbuf_addf(buf, "%-*s | %-*s | ", TR2_MAX_THREAD_NAME, - ctx->thread_name.buf, TR2FMT_PERF_MAX_EVENT_NAME, + ctx->thread_name, TR2FMT_PERF_MAX_EVENT_NAME, event_name); len = buf->len + TR2FMT_PERF_REPO_WIDTH; @@ -555,6 +556,44 @@ static void fn_printf_va_fl(const char *file, int line, strbuf_release(&buf_payload); } +static void fn_timer(const struct tr2_timer_metadata *meta, + const struct tr2_timer *timer, + int is_final_data) +{ + const char *event_name = is_final_data ? "timer" : "th_timer"; + struct strbuf buf_payload = STRBUF_INIT; + double t_total = NS_TO_SEC(timer->total_ns); + double t_min = NS_TO_SEC(timer->min_ns); + double t_max = NS_TO_SEC(timer->max_ns); + + strbuf_addf(&buf_payload, ("name:%s" + " intervals:%"PRIu64 + " total:%8.6f min:%8.6f max:%8.6f"), + meta->name, + timer->interval_count, + t_total, t_min, t_max); + + perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL, + meta->category, &buf_payload); + strbuf_release(&buf_payload); +} + +static void fn_counter(const struct tr2_counter_metadata *meta, + const struct tr2_counter *counter, + int is_final_data) +{ + const char *event_name = is_final_data ? "counter" : "th_counter"; + struct strbuf buf_payload = STRBUF_INIT; + + strbuf_addf(&buf_payload, "name:%s value:%"PRIu64, + meta->name, + counter->value); + + perf_io_write_fl(__FILE__, __LINE__, event_name, NULL, NULL, NULL, + meta->category, &buf_payload); + strbuf_release(&buf_payload); +} + struct tr2_tgt tr2_tgt_perf = { .pdst = &tr2dst_perf, @@ -586,4 +625,6 @@ struct tr2_tgt tr2_tgt_perf = { .pfn_data_fl = fn_data_fl, .pfn_data_json_fl = fn_data_json_fl, .pfn_printf_va_fl = fn_printf_va_fl, + .pfn_timer = fn_timer, + .pfn_counter = fn_counter, }; diff --git a/trace2/tr2_tls.c b/trace2/tr2_tls.c index 7da94aba522..04900bb4c3a 100644 --- a/trace2/tr2_tls.c +++ b/trace2/tr2_tls.c @@ -31,10 +31,11 @@ void tr2tls_start_process_clock(void) tr2tls_us_start_process = getnanotime() / 1000; } -struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, +struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name, uint64_t us_thread_start) { struct tr2tls_thread_ctx *ctx = xcalloc(1, sizeof(*ctx)); + struct strbuf buf = STRBUF_INIT; /* * Implicitly "tr2tls_push_self()" to capture the thread's start @@ -47,12 +48,13 @@ struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, ctx->thread_id = tr2tls_locked_increment(&tr2_next_thread_id); - strbuf_init(&ctx->thread_name, 0); + strbuf_init(&buf, 0); if (ctx->thread_id) - strbuf_addf(&ctx->thread_name, "th%02d:", ctx->thread_id); - strbuf_addstr(&ctx->thread_name, thread_name); - if (ctx->thread_name.len > TR2_MAX_THREAD_NAME) - strbuf_setlen(&ctx->thread_name, TR2_MAX_THREAD_NAME); + strbuf_addf(&buf, "th%02d:", ctx->thread_id); + strbuf_addstr(&buf, thread_base_name); + if (buf.len > TR2_MAX_THREAD_NAME) + strbuf_setlen(&buf, TR2_MAX_THREAD_NAME); + ctx->thread_name = strbuf_detach(&buf, NULL); pthread_setspecific(tr2tls_key, ctx); @@ -69,9 +71,9 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void) ctx = pthread_getspecific(tr2tls_key); /* - * If the thread-proc did not call trace2_thread_start(), we won't - * have any TLS data associated with the current thread. Fix it - * here and silently continue. + * If the current thread's thread-proc did not call + * trace2_thread_start(), then the thread will not have any + * thread-local storage. Create it now and silently continue. */ if (!ctx) ctx = tr2tls_create_self("unknown", getnanotime() / 1000); @@ -95,7 +97,7 @@ void tr2tls_unset_self(void) pthread_setspecific(tr2tls_key, NULL); - strbuf_release(&ctx->thread_name); + free((char *)ctx->thread_name); free(ctx->array_us_start); free(ctx); } @@ -113,7 +115,7 @@ void tr2tls_pop_self(void) struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); if (!ctx->nr_open_regions) - BUG("no open regions in thread '%s'", ctx->thread_name.buf); + BUG("no open regions in thread '%s'", ctx->thread_name); ctx->nr_open_regions--; } @@ -179,3 +181,13 @@ int tr2tls_locked_increment(int *p) return current_value; } + +void tr2tls_lock(void) +{ + pthread_mutex_lock(&tr2tls_mutex); +} + +void tr2tls_unlock(void) +{ + pthread_mutex_unlock(&tr2tls_mutex); +} diff --git a/trace2/tr2_tls.h b/trace2/tr2_tls.h index b1e327a928e..f9049805d4d 100644 --- a/trace2/tr2_tls.h +++ b/trace2/tr2_tls.h @@ -2,6 +2,14 @@ #define TR2_TLS_H #include "strbuf.h" +#include "trace2/tr2_ctr.h" +#include "trace2/tr2_tmr.h" + +/* + * Notice: the term "TLS" refers to "thread-local storage" in the + * Trace2 source files. This usage is borrowed from GCC and Windows. + * There is NO relation to "transport layer security". + */ /* * Arbitry limit for thread names for column alignment. @@ -9,33 +17,40 @@ #define TR2_MAX_THREAD_NAME (24) struct tr2tls_thread_ctx { - struct strbuf thread_name; + const char *thread_name; uint64_t *array_us_start; - int alloc; - int nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ + size_t alloc; + size_t nr_open_regions; /* plays role of "nr" in ALLOC_GROW */ int thread_id; + struct tr2_timer_block timer_block; + struct tr2_counter_block counter_block; + unsigned int used_any_timer:1; + unsigned int used_any_per_thread_timer:1; + unsigned int used_any_counter:1; + unsigned int used_any_per_thread_counter:1; }; /* - * Create TLS data for the current thread. This gives us a place to - * put per-thread data, such as thread start time, function nesting - * and a per-thread label for our messages. - * - * We assume the first thread is "main". Other threads are given - * non-zero thread-ids to help distinguish messages from concurrent - * threads. + * Create thread-local storage for the current thread. * - * Truncate the thread name if necessary to help with column alignment - * in printf-style messages. + * The first thread in the process will have: + * { .thread_id=0, .thread_name="main" } + * Subsequent threads are given a non-zero thread_id and a thread_name + * constructed from the id and a thread base name (which is usually just + * the name of the thread-proc function). For example: + * { .thread_id=10, .thread_name="th10:fsm-listen" } + * This helps to identify and distinguish messages from concurrent threads. + * The ctx.thread_name field is truncated if necessary to help with column + * alignment in printf-style messages. * * In this and all following functions the term "self" refers to the * current thread. */ -struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_name, +struct tr2tls_thread_ctx *tr2tls_create_self(const char *thread_base_name, uint64_t us_thread_start); /* - * Get our TLS data. + * Get the thread-local storage pointer of the current thread. */ struct tr2tls_thread_ctx *tr2tls_get_self(void); @@ -45,7 +60,7 @@ struct tr2tls_thread_ctx *tr2tls_get_self(void); int tr2tls_is_main_thread(void); /* - * Free our TLS data. + * Free the current thread's thread-local storage. */ void tr2tls_unset_self(void); @@ -81,12 +96,12 @@ uint64_t tr2tls_region_elasped_self(uint64_t us); uint64_t tr2tls_absolute_elapsed(uint64_t us); /* - * Initialize the tr2 TLS system. + * Initialize thread-local storage for Trace2. */ void tr2tls_init(void); /* - * Free all tr2 TLS resources. + * Free all Trace2 thread-local storage resources. */ void tr2tls_release(void); @@ -100,4 +115,10 @@ int tr2tls_locked_increment(int *p); */ void tr2tls_start_process_clock(void); +/* + * Explicitly lock/unlock our mutex. + */ +void tr2tls_lock(void); +void tr2tls_unlock(void); + #endif /* TR2_TLS_H */ diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c new file mode 100644 index 00000000000..786762dfd26 --- /dev/null +++ b/trace2/tr2_tmr.c @@ -0,0 +1,182 @@ +#include "cache.h" +#include "thread-utils.h" +#include "trace2/tr2_tgt.h" +#include "trace2/tr2_tls.h" +#include "trace2/tr2_tmr.h" + +#define MY_MAX(a, b) ((a) > (b) ? (a) : (b)) +#define MY_MIN(a, b) ((a) < (b) ? (a) : (b)) + +/* + * A global timer block to aggregate values from the partial sums from + * each thread. + */ +static struct tr2_timer_block final_timer_block; /* access under tr2tls_mutex */ + +/* + * Define metadata for each stopwatch timer. + * + * This array must match "enum trace2_timer_id" and the values + * in "struct tr2_timer_block.timer[*]". + */ +static struct tr2_timer_metadata tr2_timer_metadata[TRACE2_NUMBER_OF_TIMERS] = { + [TRACE2_TIMER_ID_TEST1] = { + .category = "test", + .name = "test1", + .want_per_thread_events = 0, + }, + [TRACE2_TIMER_ID_TEST2] = { + .category = "test", + .name = "test2", + .want_per_thread_events = 1, + }, + + /* Add additional metadata before here. */ +}; + +void tr2_start_timer(enum trace2_timer_id tid) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + struct tr2_timer *t = &ctx->timer_block.timer[tid]; + + t->recursion_count++; + if (t->recursion_count > 1) + return; /* ignore recursive starts */ + + t->start_ns = getnanotime(); +} + +void tr2_stop_timer(enum trace2_timer_id tid) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + struct tr2_timer *t = &ctx->timer_block.timer[tid]; + uint64_t ns_now; + uint64_t ns_interval; + + assert(t->recursion_count > 0); + + t->recursion_count--; + if (t->recursion_count) + return; /* still in recursive call(s) */ + + ns_now = getnanotime(); + ns_interval = ns_now - t->start_ns; + + t->total_ns += ns_interval; + + /* + * min_ns was initialized to zero (in the xcalloc()) rather + * than UINT_MAX when the block of timers was allocated, + * so we should always set both the min_ns and max_ns values + * the first time that the timer is used. + */ + if (!t->interval_count) { + t->min_ns = ns_interval; + t->max_ns = ns_interval; + } else { + t->min_ns = MY_MIN(ns_interval, t->min_ns); + t->max_ns = MY_MAX(ns_interval, t->max_ns); + } + + t->interval_count++; + + ctx->used_any_timer = 1; + if (tr2_timer_metadata[tid].want_per_thread_events) + ctx->used_any_per_thread_timer = 1; +} + +void tr2_update_final_timers(void) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + enum trace2_timer_id tid; + + if (!ctx->used_any_timer) + return; + + /* + * Accessing `final_timer_block` requires holding `tr2tls_mutex`. + * We assume that our caller is holding the lock. + */ + + for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) { + struct tr2_timer *t_final = &final_timer_block.timer[tid]; + struct tr2_timer *t = &ctx->timer_block.timer[tid]; + + if (t->recursion_count) { + /* + * The current thread is exiting with + * timer[tid] still running. + * + * Technically, this is a bug, but I'm going + * to ignore it. + * + * I don't think it is worth calling die() + * for. I don't think it is worth killing the + * process for this bookkeeping error. We + * might want to call warning(), but I'm going + * to wait on that. + * + * The downside here is that total_ns won't + * include the current open interval (now - + * start_ns). I can live with that. + */ + } + + if (!t->interval_count) + continue; /* this timer was not used by this thread */ + + t_final->total_ns += t->total_ns; + + /* + * final_timer_block.timer[tid].min_ns was initialized to + * was initialized to zero rather than UINT_MAX, so we should + * always set both the min_ns and max_ns values the first time + * that we add a partial sum into it. + */ + if (!t_final->interval_count) { + t_final->min_ns = t->min_ns; + t_final->max_ns = t->max_ns; + } else { + t_final->min_ns = MY_MIN(t_final->min_ns, t->min_ns); + t_final->max_ns = MY_MAX(t_final->max_ns, t->max_ns); + } + + t_final->interval_count += t->interval_count; + } +} + +void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply) +{ + struct tr2tls_thread_ctx *ctx = tr2tls_get_self(); + enum trace2_timer_id tid; + + if (!ctx->used_any_per_thread_timer) + return; + + /* + * For each timer, if the timer wants per-thread events and + * this thread used it, emit it. + */ + for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) + if (tr2_timer_metadata[tid].want_per_thread_events && + ctx->timer_block.timer[tid].interval_count) + fn_apply(&tr2_timer_metadata[tid], + &ctx->timer_block.timer[tid], + 0); +} + +void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply) +{ + enum trace2_timer_id tid; + + /* + * Accessing `final_timer_block` requires holding `tr2tls_mutex`. + * We assume that our caller is holding the lock. + */ + + for (tid = 0; tid < TRACE2_NUMBER_OF_TIMERS; tid++) + if (final_timer_block.timer[tid].interval_count) + fn_apply(&tr2_timer_metadata[tid], + &final_timer_block.timer[tid], + 1); +} diff --git a/trace2/tr2_tmr.h b/trace2/tr2_tmr.h new file mode 100644 index 00000000000..d5753576134 --- /dev/null +++ b/trace2/tr2_tmr.h @@ -0,0 +1,140 @@ +#ifndef TR2_TMR_H +#define TR2_TMR_H + +#include "trace2.h" +#include "trace2/tr2_tgt.h" + +/* + * Define a mechanism to allow "stopwatch" timers. + * + * Timers can be used to measure "interesting" activity that does not + * fit the "region" model, such as code called from many different + * regions (like zlib) and/or where data for individual calls are not + * interesting or are too numerous to be efficiently logged. + * + * Timer values are accumulated during program execution and emitted + * to the Trace2 logs at program exit. + * + * To make this model efficient, we define a compile-time fixed set of + * timers and timer ids using a "timer block" array in thread-local + * storage. This gives us constant time access to each timer within + * each thread, since we want start/stop operations to be as fast as + * possible. This lets us avoid the complexities of dynamically + * allocating a timer on the first use by a thread and/or possibly + * sharing that timer definition with other concurrent threads. + * However, this does require that we define time the set of timers at + * compile time. + * + * Each thread uses the timer block in its thread-local storage to + * compute partial sums for each timer (without locking). When a + * thread exits, those partial sums are (under lock) added to the + * global final sum. + * + * Using this "timer block" model costs ~48 bytes per timer per thread + * (we have about six uint64 fields per timer). This does increase + * the size of the thread-local storage block, but it is allocated (at + * thread create time) and not on the thread stack, so I'm not worried + * about the size. + * + * Partial sums for each timer are optionally emitted when a thread + * exits. + * + * Final sums for each timer are emitted between the "exit" and + * "atexit" events. + * + * A parallel "timer metadata" table contains the "category" and "name" + * fields for each timer. This eliminates the need to include those + * args in the various timer APIs. + */ + +/* + * The definition of an individual timer and used by an individual + * thread. + */ +struct tr2_timer { + /* + * Total elapsed time for this timer in this thread in nanoseconds. + */ + uint64_t total_ns; + + /* + * The maximum and minimum interval values observed for this + * timer in this thread. + */ + uint64_t min_ns; + uint64_t max_ns; + + /* + * The value of the clock when this timer was started in this + * thread. (Undefined when the timer is not active in this + * thread.) + */ + uint64_t start_ns; + + /* + * Number of times that this timer has been started and stopped + * in this thread. (Recursive starts are ignored.) + */ + uint64_t interval_count; + + /* + * Number of nested starts on the stack in this thread. (We + * ignore recursive starts and use this to track the recursive + * calls.) + */ + unsigned int recursion_count; +}; + +/* + * Metadata for a timer. + */ +struct tr2_timer_metadata { + const char *category; + const char *name; + + /* + * True if we should emit per-thread events for this timer + * when individual threads exit. + */ + unsigned int want_per_thread_events:1; +}; + +/* + * A compile-time fixed-size block of timers to insert into + * thread-local storage. This wrapper is used to avoid quirks + * of C and the usual need to pass an array size argument. + */ +struct tr2_timer_block { + struct tr2_timer timer[TRACE2_NUMBER_OF_TIMERS]; +}; + +/* + * Private routines used by trace2.c to actually start/stop an + * individual timer in the current thread. + */ +void tr2_start_timer(enum trace2_timer_id tid); +void tr2_stop_timer(enum trace2_timer_id tid); + +/* + * Add the current thread's timer data to the global totals. + * This is called during thread-exit. + * + * Caller must be holding the tr2tls_mutex. + */ +void tr2_update_final_timers(void); + +/* + * Emit per-thread timer data for the current thread. + * This is called during thread-exit. + */ +void tr2_emit_per_thread_timers(tr2_tgt_evt_timer_t *fn_apply); + +/* + * Emit global total timer values. + * This is called during atexit handling. + * + * Caller must be holding the tr2tls_mutex. + */ +void tr2_emit_final_timers(tr2_tgt_evt_timer_t *fn_apply); + +#endif /* TR2_TMR_H */ |