I originally posted this at http://dtrace.org/blogs/brendan/2011/06/30/tweaking-memory-on-the-fly.
DTrace can modify process address space programatically based on events, such as user-level functions or syscalls. This could be handy if you have a known software issue, and overwriting some bytes is a convenient workaround until the real software fix is available. When DTrace was first released, there was a popular demo of this making the rounds, which spoofed uname(1).
There are two functions which can do this, and they are two of the most dangerous functions in DTrace: copyout() and copyoutstr(). As a safety measure they are only available when you are using DTrace with the "destructive" option (either via a #pragma D option, or -w). The danger is from using these incorrectly, and accidentally overwriting the wrong data or overwriting the wrong location. This could quickly cause the process to fault and core dump, or worse, it could cause silent data corruption.
I was just testing this for a suggestion regarding a bug with some commercial software in Solaris zones. The bug is where the software is checking root level inodes for verification, and doesn't like how these are different in a zone. This is how it looks in a normal (global) system:
system$ ls -lai / total 746 2 drwxr-xr-x 44 root root 1536 Mar 30 18:03 . 2 drwxr-xr-x 44 root root 1536 Mar 30 18:03 .. [...]
and here is a zone (this is a Joyent SmartMachine):
myzone$ ls -lai / total 5110 4 drwxr-xr-x 18 root root 20 Dec 1 2010 . 3 drwxr-xr-x 18 root root 20 Dec 1 2010 .. [...]
The first column is the inode number, and in a zone environment the inode for the "." and ".." directories are different (4, 3) betraying the fact that this isn't the real root directory (it's virtualized). For some reason the commercial software doesn't like that, and one suggestion was to use DTrace to tweak getdents(), in lieu of a fixed version.
To try this out, I'll use DTrace to tweak the behavior of the ls(1) command. We'll know it works if running the "ls -lai /" command above shows an inode number of 3 instead of 4 for the first entry.
copyin()
I'll start by checking what ls(1) is actually using to read the directory:
myzone# dtrace -n 'syscall:::entry /execname == "ls"/ { @[probefunc] = count(); }' dtrace: description 'syscall:::entry ' matched 236 probes ^C fsat 1 getpid 1 getrlimit 1 open64 1 read 1 readlink 1 rexit 1 sysi86 1 fcntl 2 getdents64 2 <-- found it ioctl 2 setcontext 2 sysconfig 2 mmapobj 3 fstat64 4 memcntl 5 resolvepath 5 close 6 open 6 brk 8 doorfs 8 getuid 9 mmap 9 pathconf 20 lstat64 21 write 21 stat64 25 acl 40 gtime 64
With the DTrace one-liner running, I executed "ls -lai /" in another window, then hit Ctrl-C. This shows that it's using getdents64(), the 64-bit version of the get-directory-entry call.
It's prototype, from the man page, is:
int getdents(int fildes, struct dirent *buf, size_t nbyte);
It populates the buffer pointer with multiple entries, returning the size.
DTrace can examine the state of this buffer when the function returns. Since the buf pointer is pointing to a user-land address, and DTrace is running in kernel-land, in order for DTrace to inspect the data we must use copyin():
myzone# dtrace -n 'syscall::getdents64:entry /execname == "ls"/ { self->p = arg1; } syscall::getdents64:return /self->p/ { tracemem(copyin(self->p, arg1), 100); }' dtrace: description 'syscall::getdents64:entry ' matched 2 probes CPU ID FUNCTION:NAME 4 571 getdents64:return 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 0: 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 10: 18 00 2e 00 00 00 00 00 03 00 00 00 00 00 00 00 ................ 20: 02 00 00 00 00 00 00 00 18 00 2e 2e 00 00 00 00 ................ 30: 0e 00 00 00 00 00 00 00 37 2b fb 11 00 00 00 00 ........7+...... 40: 18 00 62 69 6e 00 00 00 0d 00 00 00 00 00 00 00 ..bin........... 50: fd 00 fd 11 00 00 00 00 18 00 75 73 72 00 00 00 ..........usr... 60: 0f 00 00 00 .... 4 571 getdents64:return 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 0: b7 c6 82 13 00 00 00 00 18 00 65 74 63 00 00 00 ..........etc... 10: 53 00 02 00 00 00 00 00 83 d7 18 15 00 00 00 00 S............... 20: 18 00 63 6f 72 65 00 00 3b 1f 00 00 00 00 00 00 ..core..;....... 30: 3e 46 b1 16 00 00 00 00 18 00 72 6f 6f 74 00 00 >F........root.. 40: f6 01 00 00 00 00 00 00 4e 4e d8 16 00 00 00 00 ........NN...... 50: 18 00 70 72 6f 63 00 00 0a 00 00 00 00 00 00 00 ..proc.......... 60: 53 c1 b7 17 S...
Inode numbers and directory names are visible in the buffer. I printed it using tracemem() which does the neat hex-dumps.
- 1st caveat: The "100" in the above one-liner is the length for tracemem(); unfortunately, it must be a scalar constant and can't be a variable, otherwise I'd use "arg1" to print the entire returned length. (The reason is so DTrace can reliably calculate needed buffer space before the probe enablings.)
The very first byte in the first return is the one we want to change - from 4 to 3.
copyout()
While copyout() can write bytes back to user-land, there is a problem to start with: this is an array directory entries, for which we only want to modify one entry. DTrace does not currently have loops - so stepping over this array and searching for the "." entry is difficult (one option is unrolled loops).
To do this, I'll assume that the first entry is always the "." entry. Seems to be that case whenever I've tried.
# cat -n getdents.d 1 #!/usr/sbin/dtrace -Cs 2 3 #pragma D option destructive 4 5 #include <dirent.h> 6 7 syscall::getdents*:entry 8 /zonename == "myzone" && execname == "ls"/ 9 { 10 self->buf = arg1; 11 } 12 13 syscall::getdents*:return 14 /self->buf && arg1 > 0/ 15 { 16 /* modify first entry of ls(1) getdents() */ 17 this->dep = (struct dirent *)copyin(self->buf, sizeof (struct dirent)); 18 this->dep->d_ino = 3; 19 copyout(this->dep, self->buf, sizeof (struct dirent)); 20 exit(0); 21 } 22 23 syscall::getdents*:return 24 /self->buf/ 25 { 26 self->buf = 0; 27 }
Note the destructive pragma on line 3, which is needed to allow the copyout() on line 19.
This script also uses the C preprocessor by adding the -C option on line 1, allowing the #include on line 5, which defined the "struct dirent" for lines 17 and 18.
Lines 23-27 aren't really necessary, as we should have exited on line 20.
Does it work?
global# ./getdents.d dtrace: script './getdents.d' matched 4 probes dtrace: allowing destructive actions CPU ID FUNCTION:NAME 3 571 getdents64:return myzone# ls -lai / total 5110 3 drwxr-xr-x 18 root root 20 Dec 1 2010 . 3 drwxr-xr-x 18 root root 20 Dec 1 2010 ..
Yes!
But I'm running this from the global zone (hence the check on line 8 for the zonename). I think this would probably make more sense to run within the zone, provided the zone can use DTrace in the first place (for example, having limitpriv="default,dtrace_proc,dtrace_user" in /etc/zones/myzone.xml):
myzone# ./getdents.d dtrace: script './getdents.d' matched 4 probes dtrace: allowing destructive actions dtrace: error on enabled probe ID 3 (ID 571: syscall::getdents64:return): invalid user access in action #3 at DIF offset 52 dtrace: error on enabled probe ID 3 (ID 571: syscall::getdents64:return): invalid user access in action #3 at DIF offset 52
This doesn't work. I'm running the ls(1) command as the user "brendan", and dtrace(1M) is running as root. If I run the ls(1) command as root - it works fine.
It looks like that when in a zone, DTrace can only copyout() to processes with the same user as the dtrace(1M) process. To have this DTrace script write to a brendan-owned ls(1M) command, I had to run the DTrace script as user brendan (which I did by giving a brendan-owned shell DTrace privileges: "ppriv -s A+dtrace_user,dtrace_proc PID"). This looks like a bug with how the kernel privilege checks work for copyout() in zones.
- 2nd caveat: in a zone, copyout() must be from and to the same user.
I hit a 3rd issue as well. I originally wrote the program to trace "getdents64", instead of the wildcard "getdents*" seen in the above script. But that didn't work:
myzone# grep syscall getdents64.d syscall::getdents64:entry syscall::getdents64:return myzone# ./getdents64.d -c 'ls -lai /' dtrace: script './getdents64.d' matched 2 probes dtrace: allowing destructive actions total 5110 4 drwxr-xr-x 18 root root 20 Dec 1 2010 . 3 drwxr-xr-x 18 root root 20 Dec 1 2010 .. [...]
Now it doesn't even see the events.
Fortunately I don't think this is a DTrace bug, but rather something unintended from #including the dirent.h file and using the C preprocessor. Changing the script to avoid that, and just treat the first bytes as an int:
# cat -n getdents64_int.d 1 #!/usr/sbin/dtrace -s 2 3 #pragma D option destructive 4 5 syscall::getdents64:entry 6 /zonename == "myzone" && execname == "ls"/ 7 { 8 self->buf = arg1; 9 } 10 11 syscall::getdents64:return 12 /self->buf && arg1 > 0/ 13 { 14 /* modify first entry of ls(1) getdents() */ 15 this->dep = (int *)alloca(sizeof (int)); 16 this->dep[0] = 3; 17 copyout(this->dep, self->buf, sizeof (int)); 18 exit(0); 19 } 20 21 syscall::getdents64:return 22 /self->buf/ 23 { 24 self->buf = 0; 25 }
Since this isn't modifying the d_ino member of struct dirent, there seemed little point doing the copyin(), so I've used alloca() on line 15 to create a buffer instead.
Putting this to the test:
myzone# ./getdents64_int.d -c 'ls -lai /' dtrace: script './getdents64_int.d' matched 2 probes dtrace: allowing destructive actions total 5110 3 drwxr-xr-x 18 root root 20 Dec 1 2010 . 3 drwxr-xr-x 18 root root 20 Dec 1 2010 .. [...]
That's better. Simple works.
DTrace often works smoothly, but sometimes (like with all software) there can be nits to workaround. We can get these fixed. I hope this quick post is useful for anyone else trying this capability.