Since threads of higher priority have the ability to starve other low priority threads, it is good practice to offload all non-urgent execution in these threads into lower-priority threads.
In this exercise, we will create and initialize a workqueue to offload work from a higher priority thread.
1. In the GitHub repository for this course, open the base code for this exercise, found in l7/7_e3
of whichever version directory you are using.
Threads with different priorities
2. Define the three thread priorities used in this exercise, so that thread0
is of higher priority than thread1
. The workqueue thread should have the lowest priority since we want this thread to execute offloaded (non-urgent) work. Remember that high priority translates to lower numerical value.
#define THREAD0_PRIORITY 2
#define THREAD1_PRIORITY 3
#define WORKQ_PRIORITY 4
C3. thread0
is already provided in the codebase. It initializes the internal data structures time_stamp
and delta_time
. Then, in a while-loop, the kernel function k_uptime_get()
is called to capture the time stamp. After that we emulate some work, and use k_uptime_delta()
to get and print the time it took to finish this round of work. Then sleep for 20 ms and repeat forever.
void thread0(void)
{
uint64_t time_stamp;
int64_t delta_time;
while (1) {
time_stamp = k_uptime_get();
emulate_work();
delta_time = k_uptime_delta(&time_stamp);
printk("thread0 yielding this round in %lld ms\n", delta_time);
k_msleep(20);
}
}
C4. thread1
should do the exact same thing. Add the following code for thread1:
void thread1(void)
{
uint64_t time_stamp;
int64_t delta_time;
while (1) {
time_stamp = k_uptime_get();
emulate_work();
delta_time = k_uptime_delta(&time_stamp);
printk("thread1 yielding this round in %lld ms\n", delta_time);
k_msleep(20);
}
}
CNote that this thread will get less time to process emulate_work()
since it is of lower priority.
5. Before the thread entry functions, define an inline function to emulate work that processes a loop without yielding or sleeping.
static inline void emulate_work()
{
for(volatile int count_out = 0; count_out < 150000; count_out ++);
}
CThis function should take about ~24 ms to finish on a 64 MHz nRF52840 and about ~12 ms to finish on the nRF54L15 running at 128 MHz.
6. Build the application and flash it on your development kit. Using a serial terminal you should now see the below output:
*** Booting nRF Connect SDK 2.6.1-3758bcbfa5cd ***
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 55 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 55 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 57 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 56 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 57 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 55 ms
thread0 yielding this round in 26 ms
thread0 yielding this round in 26 ms
thread1 yielding this round in 57 ms
TerminalYou can see that the higher priority thread0
completes the task emulate_work
in about 25-26 ms but thread1
takes more than double that time. This is because thread0
keeps blocking thread1
.
The timeline of threads should look something like below:
Offloading work from high priority task
Since thread0
is processing non-urgent work, it is not good practice to block other threads just to perform this work. Let’s offload the non-urgent emulate_work()
into a lower priority workqueue thread.
7. We need to associate our work (emulate_work()
) as a work item and push it to a specific workqueue. This is done by creating a work_info
structure and a function, offload_function()
that should only run emulate_work()
.
struct work_info {
struct k_work work;
char name[25];
} my_work;
void offload_function(struct k_work *work_term)
{
emulate_work();
}
C8. In the entry function for thread0
, start the workqueue using k_work_queue_start()
. Then initialize the work item using k_work_init()
to connect the work item to its handler offload_function()
.
k_work_queue_start(&offload_work_q, my_stack_area,
K_THREAD_STACK_SIZEOF(my_stack_area), WORKQ_PRIORITY,
NULL);
strcpy(my_work.name, "Thread0 emulate_work()");
k_work_init(&my_work.work, offload_function);
C9. Instead of running emulate_work
in the while-loop, submit a work item to the workqueue using k_work_submit_to_queue()
k_work_submit_to_queue(&offload_work_q, &my_work.work);
Cthread0
is now offloading the processing of emulate_work()
into the lower priority worker thread which means that it should process less in this high priority context before it goes to sleep (for 20 ms). This, in turn, should translate to more processing time for thread1 (by fewer interruptions from thread0
).
10. Build the application and flash it on your development kit. Using a serial terminal, you should see the below output:
*** Booting nRF Connect SDK 2.6.1-3758bcbfa5cd ***
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 31 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 3thread0 yielding this round in 0 ms
3 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 30 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 3thread0 yielding this round in 0 ms
3 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 29 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
thread1 yielding this round in 33 ms
thread0 yielding this round in 0 ms
thread0 yielding this round in 0 ms
TerminalOn the nRF54L15, it should take half the time shown in the snippet above as it runs on 128 MHz.
The timeline of the threads looks something like this
As you can see, thread0
now completes its round within less than a millisecond before it sleeps, giving other lesser priority threads more time to run. This is acceptable for thread0
since it can live with postponed execution of emulate_work()
. Also, notice now that thread1
takes much less time to finish its round of processing the work as compared to the scenario where thread0
was not using the workqueue to offload work. This is an example of good architecture as we only keep urgent work to be processed in higher priorities and non-urgent work is offloaded to the appropriate lower priority. As an application designer on the RTOS, you should be aware of the kernel services provided to the application and make best use of it so as to avoid unnecessary latencies.
The solution for this exercise can be found in the GitHub repository, l7/l7_e3_sol
of whichever version directory you are using.