This article describes the creation of a new device mapper target called rdelay-uniform, based on the existing delay target, that delays I/O for a random time inside a pre-defined range, following a uniform distribution law. In a second part it also presents a statistical study of the delayed timings (from an application point of view) for both delay and rdelay-uniform mappings as we expect the latter's delays distribution to be uniform.
Table of Contents
The device mapper is a kernel driver that provides logical block device management. It supports a variety of mapping targets, including linear mapping (assembles linear ranges of devices), the error mapping (errors any I/O that goes in) or the mirror mapping (mirrors data across devices). Targets are enabled in the kernel configuration, and can be compiled as modules. The dmsetup targets command lists the supported targets on the system:
# dmsetup targets delay v1.2.1 verity v1.2.0 multipath v1.9.0 flakey v1.3.1 crypt v1.14.0 cache v1.6.0 zero v1.1.0 striped v1.5.1 linear v1.2.1 error v1.3.0
Among the list of provided targets is the delay target. The delay target is a mapping that delays reads and/or writes for the specified duration. The delay is fixed so each I/O will be on hold for the same amout of time before it is released for service. While this is usefull for testing (to simulate a slow device for example), an enhancement would be to have the possibility to delay I/Os for a random time, inside a given range (to add some unstability in the I/O serivce time for example). To achieve this, we propose to create a new device mapper target, based on the delay target, called rdelay-uniform. Applications that would use this target should see its I/Os delayed uniformly between a minimum and a maximum duration, configured in the mapping. To verify that, we will use the rl program that reads a single block on a device and return the time it took to perform it. The program loops so when it runs for some time it gets a usable sample of read delays for the block device. Computing the empirical distribution function must give a linear function, proof that the sample follows a uniform distribution law.
In the device mapper source code, located in drivers/md in the kernel source tree, each target is represented by a separate file, that can be compiled in one kernel module (or directly included in the kernel binary).
To create the rdelay-uniform target, we then need to create the dm-rdelay-uniform module. We start by copying the dm-delay.c source file. The key aspect for us in this file is the following code defined in the delay_bio function:
delayed->expires = expires = jiffies + msecs_to_jiffies(delay);
that sets an expiration time, calculated with the current time (jiffies) and the configured delay (delay), for each I/O. Basically this is where we need to add randomness.
We use the function get_random_bytes to get a random number in the kernel:
unsigned int i = 0; get_random_bytes(&i, sizeof(int));
This gets a random number between 0 and UINT_MAX (from limits.h). To get a random number within a range of [0:maxdev-1], we use the modulo operation:
unsigned random = 0; random = i % maxdev;
Since the range starts with 0, we have exactly maxdev numbers in the range.
We then modify the code above the with the following:
delayed->expires = expires = jiffies + msecs_to_jiffies(delay - maxdev/2 + random);
That will delay the I/O between delay - maxdev/2 and delay + maxdev/2.
We then modify the Constructor Function (the one called by dmsetup create) to include the reading of the new parameter rand. In addition, we modify the corresponding Kconfig to add the config option to build the source file as a module.
Finally, we create a kernel patch that can be reusable:
diff -uprN -X linux-4.1.3-rdelay/Documentation/dontdiff linux-4.1.3-vanilla/ linux-4.1.3-rdelay/ > patch-4.1.3-rdelay-uniform-0.1
The rdelay-uniform patch has been tested on the Linux kernel 4.1.3. This section explains how to obtain the patch and install it with the appropriate kernel.
Get the kernel 4.1.3:
wget https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.3.tar.xz
Get the rdelay-uniform patch:
wget https://depot.gukinet.com/projects/rdelay-uniform/src/patch-4.1.3-rdelay-uniform-0.1
Untar the kernel:
tar -xf linux-4.1.3.tar.xz
Go to the kernel folder:
cd linux-4.1.3
Apply the path:
patch -p1 < ../patch-4.1.3-rdelay-uniform-0.1
Configure the kernel:
make menuconfig
Go to Device Drivers, Multiple devices driver support (RAID and LVM) and select I/O random delaying target (uniform distribution):

Compile the kernel:
make
Install the kernel:
cp arch/x86/boot/bzImage /boot/vmlinuz-4.1.3
Install the modules:
make modules_install
Re-create the initrd:
mkinitramfs -o /boot/initrd-4.1.3 4.1.3
Update the boot loader and reboot:
reboot
Once rebooted, load the dm-rdelay-uniform module:
modprobe dm-rdelay-uniform
The dm-rdelay-uniform module is an adaptation of the dm-delay module that take advantage of the kernel function get_random_bytes to modify randomly the delay of each I/O going through the mapping. The delay is not fixed any more but varies inside some boundaries that you can parameter.
One the dm-rdelay-uniform module is loaded you should see the rdelay-uniform target in the list of available device mapper targets:
# dmsetup targets | grep rdelay-uniform rdelay-uniform v1.0.0
You can now create a mapping with this target. The parameters for the mapping table are:
<device> <offset> <delay> <rand> [<write_device> <write_offset> <write_delay> <write_rand>]
The mapping will delay reads and/or writes and optionnaly maps them to different devices. The delay is not fixed but follows a uniform distribution law centered on the delay parameter. The minimum and maximum parameters of the uniform distribution are defined with the rand parameter of the module:

Note that with separate write parameters in the mapping table, the first set is only used for reads. Delays are specified in milliseconds.
Our goal now is to test the dm-rdelay-uniform module and check that we see the delays corresponding to the parameters we actually set.
To do that, we create two mappings, one with the delay target, and one with the rdelay-uniform target. We then run a program that reads a specific block on those devices and reports the timing for each I/O completion.
We use a ram disk as a backend block device to minimize the additionnal delays in the I/O completion and get more accurate results. To add pre-defined ram drives to the system, load the block ram disk module:
modprobe brd
The create the first mapping on /dev/ram0 with the delay target. We set a delay of 50 ms:
echo "0 `blockdev --getsize /dev/ram0` delay /dev/ram0 0 50" | \ dmsetup create map0
Create the second mapping on /dev/ram1 with the rdelay-uniform target. We set the same delay of 50 ms and add 10 ms of randomness around the delay:
echo "0 `blockdev --getsize /dev/ram1` rdelay-uniform /dev/ram1 0 50 10" | \ dmsetup create map1
Check that the mappings are listed in the device mapper table with the correct parameters:
# dmsetup table | grep delay map0: 0 8192 delay 1:0 0 50 map1: 0 8192 rdelay-uniform 1:1 0 50 10
We run the test on map0:
# ./rl /dev/mapper/map0 1456587672.587609 0.049894 1456587673.590154 0.051221 1456587674.591308 0.050237 1456587675.593189 0.051551 1456587676.595258 0.050277 1456587677.596735 0.049792 1456587678.598961 0.049664 1456587679.600372 0.050168 1456587680.602670 0.049919 1456587681.603933 0.049607
The first column of the output is the time in second since epoch. The second column is the time in seconds of the read function to complete. We see the reads delay are close to 50 ms.
Test on map1:
# ./rl /dev/mapper/map1 1456746119.599286 0.047301 1456746120.600688 0.049923 1456746121.601991 0.053548 1456746122.603698 0.049852 1456746123.605175 0.052385 1456746124.606420 0.047116 1456746125.608179 0.046524 1456746126.609440 0.051116 1456746127.610935 0.051705 1456746128.612308 0.054239
We see that the delays vary around 50 ms, between 45 and 55 ms, wich is the expected behaviour.
We saw in the previous section that the time of completion of the read function for a mapping using the delay target with a delay parameter of 50 ms is indeed close to 50 ms.
In this section we want to know how this read completion time is distributed, in order to compare it with the distribution of the rdelay-uniform target.
In order to compute this empirical distribution, we let the rl program run for 10 minutes on the map0 device to get enough data for a significant result.
The result is the read completion time follows an Erlang distribution (with a shape parameter equal to 4). The mean value computed on this fit is 50.23 ms and the standard deviation is 0.41 ms:

This graph confirms the delay target delays the I/O completion with an average value of its delay parameter.
We ran the same rl program on the map1 device as for the delay mapping and computed the empirical distribution function of the results for the rdelay-uniform target. The outcome is that the empirical distribution fits the uniform distribution:

This computed average of the distribution is 50.31 ms, which correspond with a good approximation to the delay parameter of the target, 50 ms. The observed minimum and maximum parameters of the uniform distribution are respectively 44.75 ms and 55.87 ms, which are close to the theoretical values of 45 ms and 55 ms, more or less the statistical errors.
In this article we saw how to modify the dm-delay device mapper target to add pre-defined randomness to the delay parameter. The code modification led to a new mapping target called rdelay-uniform that can be compiled as a kernel module. We ran timing tests from an application point of view and applied fitting methods to deduct the statistics behind the results.
The outcome is the rdelay-uniform patch delays the I/O completion time by following a uniform distribution law centered on the delay target parameter, with a random amplitude equal to the rand parameter. We conclude that we created a valid kernel patch, that meets our initial expectations.
A. Patch source
diff -uprN -X linux-4.1.3-rdelay/Documentation/dontdiff linux-4.1.3-vanilla/drivers/md/dm-rdelay-uniform.c linux-4.1.3-rdelay/drivers/md/dm-rdelay-uniform.c
--- linux-4.1.3-vanilla/drivers/md/dm-rdelay-uniform.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-4.1.3-rdelay/drivers/md/dm-rdelay-uniform.c 2016-02-29 12:36:19.395737778 +0100
@@ -0,0 +1,406 @@
+/*
+ * Copyright (C) 2016 Guillaume Kielwasser
+ *
+ * A target that delays reads and/or writes randomly and can send
+ * them to different devices.
+ *
+ * This file is released under the GPL.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+#include <linux/slab.h>
+#include <linux/random.h>
+
+#include <linux/device-mapper.h>
+
+#define DM_MSG_PREFIX "rdelay-uniform"
+
+struct delay_c {
+ struct timer_list delay_timer;
+ struct mutex timer_lock;
+ struct workqueue_struct *kdelayd_wq;
+ struct work_struct flush_expired_bios;
+ struct list_head delayed_bios;
+ atomic_t may_delay;
+
+ struct dm_dev *dev_read;
+ sector_t start_read;
+ unsigned read_delay;
+ unsigned reads;
+ unsigned read_maxdev;
+
+ struct dm_dev *dev_write;
+ sector_t start_write;
+ unsigned write_delay;
+ unsigned writes;
+ unsigned write_maxdev;
+};
+
+struct dm_delay_info {
+ struct delay_c *context;
+ struct list_head list;
+ unsigned long expires;
+};
+
+static DEFINE_MUTEX(delayed_bios_lock);
+
+static void handle_delayed_timer(unsigned long data)
+{
+ struct delay_c *dc = (struct delay_c *)data;
+
+ queue_work(dc->kdelayd_wq, &dc->flush_expired_bios);
+}
+
+static void queue_timeout(struct delay_c *dc, unsigned long expires)
+{
+ mutex_lock(&dc->timer_lock);
+
+ if (!timer_pending(&dc->delay_timer) || expires < dc->delay_timer.expires)
+ mod_timer(&dc->delay_timer, expires);
+
+ mutex_unlock(&dc->timer_lock);
+}
+
+static void flush_bios(struct bio *bio)
+{
+ struct bio *n;
+
+ while (bio) {
+ n = bio->bi_next;
+ bio->bi_next = NULL;
+ generic_make_request(bio);
+ bio = n;
+ }
+}
+
+static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
+{
+ struct dm_delay_info *delayed, *next;
+ unsigned long next_expires = 0;
+ int start_timer = 0;
+ struct bio_list flush_bios = { };
+
+ mutex_lock(&delayed_bios_lock);
+ list_for_each_entry_safe(delayed, next, &dc->delayed_bios, list) {
+ if (flush_all || time_after_eq(jiffies, delayed->expires)) {
+ struct bio *bio = dm_bio_from_per_bio_data(delayed,
+ sizeof(struct dm_delay_info));
+ list_del(&delayed->list);
+ bio_list_add(&flush_bios, bio);
+ if ((bio_data_dir(bio) == WRITE))
+ delayed->context->writes--;
+ else
+ delayed->context->reads--;
+ continue;
+ }
+
+ if (!start_timer) {
+ start_timer = 1;
+ next_expires = delayed->expires;
+ } else
+ next_expires = min(next_expires, delayed->expires);
+ }
+
+ mutex_unlock(&delayed_bios_lock);
+
+ if (start_timer)
+ queue_timeout(dc, next_expires);
+
+ return bio_list_get(&flush_bios);
+}
+
+static void flush_expired_bios(struct work_struct *work)
+{
+ struct delay_c *dc;
+
+ dc = container_of(work, struct delay_c, flush_expired_bios);
+ flush_bios(flush_delayed_bios(dc, 0));
+}
+
+/*
+ * Mapping parameters:
+ * <device> <offset> <delay> <maxdev> [<write_device> <write_offset> <write_delay>]
+ *
+ * With separate write parameters, the first set is only used for reads.
+ * Delays are specified in milliseconds.
+ */
+static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct delay_c *dc;
+ unsigned long long tmpll;
+ char dummy;
+
+ if (argc != 4 && argc != 8) {
+ ti->error = "requires exactly 4 or 6 arguments";
+ return -EINVAL;
+ }
+
+ dc = kmalloc(sizeof(*dc), GFP_KERNEL);
+ if (!dc) {
+ ti->error = "Cannot allocate context";
+ return -ENOMEM;
+ }
+
+ dc->reads = dc->writes = 0;
+
+ if (sscanf(argv[1], "%llu%c", &tmpll, &dummy) != 1) {
+ ti->error = "Invalid device sector";
+ goto bad;
+ }
+ dc->start_read = tmpll;
+
+ if (sscanf(argv[2], "%u%c", &dc->read_delay, &dummy) != 1) {
+ ti->error = "Invalid delay";
+ goto bad;
+ }
+
+ if (sscanf(argv[3], "%u%c", &dc->read_maxdev, &dummy) != 1) {
+ ti->error = "Invalid maxdev";
+ goto bad;
+ }
+ if (dc->read_maxdev > dc->read_delay) {
+ ti->error = "maxdev supperior to the delay";
+ goto bad;
+ }
+ /* if maxdev is odd (ie maxdev modulo 2 eq 0), add 1 so it's even */
+ if (! (dc->read_maxdev % 2))
+ dc->read_maxdev += 1;
+
+ if (dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+ &dc->dev_read)) {
+ ti->error = "Device lookup failed";
+ goto bad;
+ }
+
+ dc->dev_write = NULL;
+ if (argc == 4)
+ goto out;
+
+ if (sscanf(argv[5], "%llu%c", &tmpll, &dummy) != 1) {
+ ti->error = "Invalid write device sector";
+ goto bad_dev_read;
+ }
+ dc->start_write = tmpll;
+
+ if (sscanf(argv[6], "%u%c", &dc->write_delay, &dummy) != 1) {
+ ti->error = "Invalid write delay";
+ goto bad_dev_read;
+ }
+ if (sscanf(argv[7], "%u%c", &dc->write_maxdev, &dummy) != 1) {
+ ti->error = "Invalid write maxdev";
+ goto bad_dev_read;
+ }
+ if (dc->write_maxdev > dc->write_delay) {
+ ti->error = "maxdev supperior to the delay";
+ goto bad;
+ }
+ /* if maxdev is odd (ie maxdev modulo 2 eq 0), add 1 so it's even */
+ if (! (dc->write_maxdev % 2))
+ dc->write_maxdev += 1;
+
+ if (dm_get_device(ti, argv[4], dm_table_get_mode(ti->table),
+ &dc->dev_write)) {
+ ti->error = "Write device lookup failed";
+ goto bad_dev_read;
+ }
+
+out:
+ dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0);
+ if (!dc->kdelayd_wq) {
+ DMERR("Couldn't start kdelayd");
+ goto bad_queue;
+ }
+
+ setup_timer(&dc->delay_timer, handle_delayed_timer, (unsigned long)dc);
+
+ INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
+ INIT_LIST_HEAD(&dc->delayed_bios);
+ mutex_init(&dc->timer_lock);
+ atomic_set(&dc->may_delay, 1);
+
+ ti->num_flush_bios = 1;
+ ti->num_discard_bios = 1;
+ ti->per_bio_data_size = sizeof(struct dm_delay_info);
+ ti->private = dc;
+ return 0;
+
+bad_queue:
+ if (dc->dev_write)
+ dm_put_device(ti, dc->dev_write);
+bad_dev_read:
+ dm_put_device(ti, dc->dev_read);
+bad:
+ kfree(dc);
+ return -EINVAL;
+}
+
+static void delay_dtr(struct dm_target *ti)
+{
+ struct delay_c *dc = ti->private;
+
+ destroy_workqueue(dc->kdelayd_wq);
+
+ dm_put_device(ti, dc->dev_read);
+
+ if (dc->dev_write)
+ dm_put_device(ti, dc->dev_write);
+
+ kfree(dc);
+}
+
+static int delay_bio(struct delay_c *dc, int delay, int maxdev, struct bio *bio)
+{
+ struct dm_delay_info *delayed;
+ unsigned long expires = 0;
+ unsigned int i = 0, random = 0;
+
+ get_random_bytes(&i, sizeof(int));
+ random = i % maxdev;
+ pr_info("get_random_bytes: %i, random ms: %i\n", i, random);
+
+ if (!delay || !atomic_read(&dc->may_delay))
+ return 1;
+
+ delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
+
+ delayed->context = dc;
+
+ delayed->expires = expires = jiffies +
+ msecs_to_jiffies(delay - maxdev/2 + random);
+
+ mutex_lock(&delayed_bios_lock);
+
+ if (bio_data_dir(bio) == WRITE)
+ dc->writes++;
+ else
+ dc->reads++;
+
+ list_add_tail(&delayed->list, &dc->delayed_bios);
+
+ mutex_unlock(&delayed_bios_lock);
+
+ queue_timeout(dc, expires);
+
+ return 0;
+}
+
+static void delay_presuspend(struct dm_target *ti)
+{
+ struct delay_c *dc = ti->private;
+
+ atomic_set(&dc->may_delay, 0);
+ del_timer_sync(&dc->delay_timer);
+ flush_bios(flush_delayed_bios(dc, 1));
+}
+
+static void delay_resume(struct dm_target *ti)
+{
+ struct delay_c *dc = ti->private;
+
+ atomic_set(&dc->may_delay, 1);
+}
+
+static int delay_map(struct dm_target *ti, struct bio *bio)
+{
+ struct delay_c *dc = ti->private;
+
+ if ((bio_data_dir(bio) == WRITE) && (dc->dev_write)) {
+ bio->bi_bdev = dc->dev_write->bdev;
+ if (bio_sectors(bio))
+ bio->bi_iter.bi_sector = dc->start_write +
+ dm_target_offset(ti, bio->bi_iter.bi_sector);
+
+ return delay_bio(dc, dc->write_delay, dc->write_maxdev, bio);
+ }
+
+ bio->bi_bdev = dc->dev_read->bdev;
+ bio->bi_iter.bi_sector = dc->start_read +
+ dm_target_offset(ti, bio->bi_iter.bi_sector);
+
+ return delay_bio(dc, dc->read_delay, dc->read_maxdev, bio);
+}
+
+static void delay_status(struct dm_target *ti, status_type_t type,
+ unsigned status_flags, char *result, unsigned maxlen)
+{
+ struct delay_c *dc = ti->private;
+ int sz = 0;
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ DMEMIT("%u %u", dc->reads, dc->writes);
+ break;
+
+ case STATUSTYPE_TABLE:
+ DMEMIT("%s %llu %u %i", dc->dev_read->name,
+ (unsigned long long) dc->start_read,
+ dc->read_delay, dc->read_maxdev);
+ if (dc->dev_write)
+ DMEMIT(" %s %llu %u %u", dc->dev_write->name,
+ (unsigned long long) dc->start_write,
+ dc->write_delay, dc->write_maxdev);
+ break;
+ }
+}
+
+static int delay_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data)
+{
+ struct delay_c *dc = ti->private;
+ int ret = 0;
+
+ ret = fn(ti, dc->dev_read, dc->start_read, ti->len, data);
+ if (ret)
+ goto out;
+
+ if (dc->dev_write)
+ ret = fn(ti, dc->dev_write, dc->start_write, ti->len, data);
+
+out:
+ return ret;
+}
+
+static struct target_type delay_target = {
+ .name = "rdelay-uniform",
+ .version = {1, 0, 0},
+ .module = THIS_MODULE,
+ .ctr = delay_ctr,
+ .dtr = delay_dtr,
+ .map = delay_map,
+ .presuspend = delay_presuspend,
+ .resume = delay_resume,
+ .status = delay_status,
+ .iterate_devices = delay_iterate_devices,
+};
+
+static int __init dm_delay_init(void)
+{
+ int r;
+
+ r = dm_register_target(&delay_target);
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ goto bad_register;
+ }
+
+ return 0;
+
+bad_register:
+ return r;
+}
+
+static void __exit dm_delay_exit(void)
+{
+ dm_unregister_target(&delay_target);
+}
+
+/* Module hooks */
+module_init(dm_delay_init);
+module_exit(dm_delay_exit);
+
+MODULE_DESCRIPTION(DM_NAME " uniform random delay target");
+MODULE_AUTHOR("Guillaume Kielwasser <This email address is being protected from spambots. You need JavaScript enabled to view it.>");
+MODULE_LICENSE("GPL");
diff -uprN -X linux-4.1.3-rdelay/Documentation/dontdiff linux-4.1.3-vanilla/drivers/md/Kconfig linux-4.1.3-rdelay/drivers/md/Kconfig
--- linux-4.1.3-vanilla/drivers/md/Kconfig 2015-07-21 19:10:33.000000000 +0200
+++ linux-4.1.3-rdelay/drivers/md/Kconfig 2016-02-29 12:35:56.307736522 +0100
@@ -413,6 +413,16 @@ config DM_DELAY
If unsure, say N.
+config DM_RDELAY_UNIFORM
+ tristate "I/O random delaying target (uniform distribution)"
+ depends on BLK_DEV_DM
+ ---help---
+ A target that delays reads and/or writes randomly using the uniform
+ distribution and can send them to different devices. Based on the
+ dm-delay module. Useful for testing.
+
+ If unsure, say N.
+
config DM_UEVENT
bool "DM uevents"
depends on BLK_DEV_DM
diff -uprN -X linux-4.1.3-rdelay/Documentation/dontdiff linux-4.1.3-vanilla/drivers/md/Makefile linux-4.1.3-rdelay/drivers/md/Makefile
--- linux-4.1.3-vanilla/drivers/md/Makefile 2015-07-21 19:10:33.000000000 +0200
+++ linux-4.1.3-rdelay/drivers/md/Makefile 2016-02-29 12:35:56.307736522 +0100
@@ -39,6 +39,7 @@ obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
obj-$(CONFIG_DM_DELAY) += dm-delay.o
+obj-$(CONFIG_DM_RDELAY_UNIFORM) += dm-rdelay-uniform.o
obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o
obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o