通过使用输入输出缓存, 我们可以提高代码的效率。 我们可以建立一个输入缓存, 并一次读入一系列的字节。 然后, 我们再一个接一个地从缓存中提取它们。
同样, 我们可以建立一个输出缓存。 把我们的输出存在里面,直到添满。 同时, 我们将让内核将缓存的内容写到标准输出 stdout 上。
程序将在没有输入的时候结束。 但是我们仍然需要让内核再向标准输出 stdout 进行最后一次写操作, 否则一些内容将留在缓存中, 永不输出。 别忘记这个操作, 否则你将会困惑为什么你的程序丢失了一些应有的输出。
%include 'system.inc'
%define BUFSIZE 2048
section .data
hex db '0123456789ABCDEF'
section .bss
ibuffer resb BUFSIZE
obuffer resb BUFSIZE
section .text
global _start
_start:
sub eax, eax
sub ebx, ebx
sub ecx, ecx
mov edi, obuffer
.loop:
; read a byte from stdin
call getchar
; convert it to hex
mov dl, al
shr al, 4
mov al, [hex+eax]
call putchar
mov al, dl
and al, 0Fh
mov al, [hex+eax]
call putchar
mov al, ' '
cmp dl, 0Ah
jne .put
mov al, dl
.put:
call putchar
jmp short .loop
align 4
getchar:
or ebx, ebx
jne .fetch
call read
.fetch:
lodsb
dec ebx
ret
read:
push dword BUFSIZE
mov esi, ibuffer
push esi
push dword stdin
sys.read
add esp, byte 12
mov ebx, eax
or eax, eax
je .done
sub eax, eax
ret
align 4
.done:
call write ; flush output buffer
push dword 0
sys.exit
align 4
putchar:
stosb
inc ecx
cmp ecx, BUFSIZE
je write
ret
align 4
write:
sub edi, ecx ; start of buffer
push ecx
push edi
push dword stdout
sys.write
add esp, byte 12
sub eax, eax
sub ecx, ecx ; buffer is empty now
ret
现在我们的程序有了第三个部分,名字叫 .bss。
这个部分不会包含在我们可执行文件里, 因此不会被初始化。 我们需要用 resb 代替 db。
它仅仅为我们保留了指定大小的未初始化内存。
We take advantage of the fact that the system does not modify the registers: We
use registers for what, otherwise, would have to be global variables stored in the .data section. This is also why the UNIX® convention of passing parameters to system calls on
the stack is superior to the Microsoft convention of passing them in the registers: We
can keep the registers for our own use.
We use EDI and ESI as
pointers to the next byte to be read from or written to. We use EBX and ECX to keep count of the
number of bytes in the two buffers, so we know when to dump the output to, or read more
input from, the system.
Let us see how it works now:
% nasm -f elf hex.asm % ld -s -o hex hex.o % ./hex Hello, World! Here I come! 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A 48 65 72 65 20 49 20 63 6F 6D 65 21 0A ^D %
Not what you expected? The program did not print the output until we pressed ^D. That is easy to fix by inserting three lines of code to write
the output every time we have converted a new line to 0A. I
have marked the three lines with > (do not copy the > in your hex.asm).
%include 'system.inc'
%define BUFSIZE 2048
section .data
hex db '0123456789ABCDEF'
section .bss
ibuffer resb BUFSIZE
obuffer resb BUFSIZE
section .text
global _start
_start:
sub eax, eax
sub ebx, ebx
sub ecx, ecx
mov edi, obuffer
.loop:
; read a byte from stdin
call getchar
; convert it to hex
mov dl, al
shr al, 4
mov al, [hex+eax]
call putchar
mov al, dl
and al, 0Fh
mov al, [hex+eax]
call putchar
mov al, ' '
cmp dl, 0Ah
jne .put
mov al, dl
.put:
call putchar
> cmp al, 0Ah
> jne .loop
> call write
jmp short .loop
align 4
getchar:
or ebx, ebx
jne .fetch
call read
.fetch:
lodsb
dec ebx
ret
read:
push dword BUFSIZE
mov esi, ibuffer
push esi
push dword stdin
sys.read
add esp, byte 12
mov ebx, eax
or eax, eax
je .done
sub eax, eax
ret
align 4
.done:
call write ; flush output buffer
push dword 0
sys.exit
align 4
putchar:
stosb
inc ecx
cmp ecx, BUFSIZE
je write
ret
align 4
write:
sub edi, ecx ; start of buffer
push ecx
push edi
push dword stdout
sys.write
add esp, byte 12
sub eax, eax
sub ecx, ecx ; buffer is empty now
ret
Now, let us see how it works:
% nasm -f elf hex.asm % ld -s -o hex hex.o % ./hex Hello, World! 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A Here I come! 48 65 72 65 20 49 20 63 6F 6D 65 21 0A ^D %
Not bad for a 644-byte executable, is it!
注意: This approach to buffered input/output still contains a hidden danger. I will discuss──and fix──it later, when I talk about the dark side of buffering.
警告: This may be a somewhat advanced topic, mostly of interest to programmers familiar with the theory of compilers. If you wish, you may skip to the next section, and perhaps read this later.
While our sample program does not require it, more sophisticated filters often need to look ahead. In other words, they may need to see what the next character is (or even several characters). If the next character is of a certain value, it is part of the token currently being processed. Otherwise, it is not.
For example, you may be parsing the input stream for a textual string (e.g., when implementing a language compiler): If a character is followed by another character, or perhaps a digit, it is part of the token you are processing. If it is followed by white space, or some other value, then it is not part of the current token.
This presents an interesting problem: How to return the next character back to the input stream, so it can be read again later?
One possible solution is to store it in a character variable, then set a flag. We
can modify getchar to check the flag, and if it is set,
fetch the byte from that variable instead of the input buffer, and reset the flag. But,
of course, that slows us down.
The C language has an ungetc() function, just for
that purpose. Is there a quick way to implement it in our code? I would like you to
scroll back up and take a look at the getchar procedure and
see if you can find a nice and fast solution before reading the next paragraph. Then come
back here and see my own solution.
The key to returning a character back to the stream is in how we are getting the characters to start with:
First we check if the buffer is empty by testing the value of EBX. If it is zero, we call the read
procedure.
If we do have a character available, we use lodsb,
then decrease the value of EBX. The lodsb instruction is effectively identical to:
mov al, [esi]
inc esi
The byte we have fetched remains in the buffer until the next time read is called. We do not know when that happens, but we do know
it will not happen until the next call to getchar. Hence,
to "return" the last-read byte back to the stream, all we have to do is decrease the
value of ESI and increase the value of EBX:
ungetc:
dec esi
inc ebx
ret
But, be careful! We are perfectly safe doing this if our look-ahead is at most one
character at a time. If we are examining more than one upcoming character and call ungetc several times in a row, it will work most of the time, but
not all the time (and will be tough to debug). Why?
Because as long as getchar does not have to call
read, all of the pre-read bytes are still in the buffer,
and our ungetc works without a glitch. But the moment getchar calls read, the contents of
the buffer change.
We can always rely on ungetc working properly on the
last character we have read with getchar, but not on
anything we have read before that.
If your program reads more than one byte ahead, you have at least two choices:
If possible, modify the program so it only reads one byte ahead. This is the simplest solution.
If that option is not available, first of all determine the maximum number of
characters your program needs to return to the input stream at one time. Increase that
number slightly, just to be sure, preferably to a multiple of 16──so it aligns nicely.
Then modify the .bss section of your code, and create a
small "spare" buffer right before your input buffer, something like this:
section .bss
resb 16 ; or whatever the value you came up with
ibuffer resb BUFSIZE
obuffer resb BUFSIZE
You also need to modify your ungetc to pass the
value of the byte to unget in AL:
ungetc:
dec esi
inc ebx
mov [esi], al
ret
With this modification, you can call ungetc up to 17
times in a row safely (the first call will still be within the buffer, the remaining 16
may be either within the buffer or within the "spare").
本文档和其它文档可从这里下载:ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.
如果对于FreeBSD有问题,请先阅读文档,如不能解决再联系<questions@FreeBSD.org>.
关于本文档的问题请发信联系 <doc@FreeBSD.org>.