Introduction to Computer Systems 15-213/18-243, spring 2009

Introduction to Computer Systems 15-213/18-243, spring 2009

Ithaca College Machine-Level Programming XI: inline Assembly Comp 21000: Introduction to Computer Systems & Assembly Lang On-Line resources* See see http://www.codeproject.com/Articles/15971/Using-Inline-Assembly-in-C-C And 1 Ithaca College Today Inline assembly Examples Exercises Assembly from scratch Examples Exercises

2 Ithaca College Why use assembly? Assembly can express very low-level things: you can access machine-dependent registers and I/O you can control the exact code behavior in critical sections that might otherwise involve deadlock between multiple software threads or hardware devices you can break the conventions of your usual compiler, which might allow some optimizations (like temporarily breaking rules about memory allocation, threading, calling conventions, etc) you can build interfaces between code fragments using incompatible conventions (e.g. produced by different compilers, or separated by a low-level interface) you can get access to unusual programming modes of your processor (e.g. 16 bit mode to interface startup, firmware, or legacy code on Intel PCs) you can produce reasonably fast code for tight loops to cope with a bad nonoptimizing compiler (but then, there are free optimizing compilers available!) you can produce hand-optimized code perfectly tuned for your particular hardware

setup, though not to someone else's you can write some code for your new language's optimizing compiler (that is something that very few people will ever do, and even they not often) i.e. you can be in complete control of your code From the linux assembly howto 3 Ithaca College Why use assembly? Assembly can express very low-level things: there are a number of special registers storing process state information that the operating system must access. There are either special instructions or special memory locations for performing input and output operations. Even for application programmers, there are some machine features, such as the values of the condition codes, that cannot be accessed directly in C. 4 Ithaca College Why use assembly? As Charles Fiterman says on comp.compilers about human vs computergenerated assembly code: The human should always win and here is why. First the human writes the whole thing in a high level language. Second she profiles it to find the hot spots where it spends its time.

Third she has the compiler produce assembly for those small sections of code. Fourth she hand tunes them looking for tiny improvements over the machine generated code. The human wins because she can use the machine. 5 Ithaca College Why use assembly? Speed Be careful! Optimizing compilers are almost always better! useful when you know an assembly language instruction that can replace a library call Example: transcendental function computation has macros for some inline assembly sequences Example: if spend most of the time in a loop computing the sine and cosine of the same angles, could use the fsincos assembly function Well see an example later See this page for bit hacking tricks: http://graphics.stanford.edu/~seander/bithacks.html 6 Ithaca College Speed example

Find the most significant bit. int main (int argc, char* argv[]) { int max = atoi (argv[1]); int number; unsigned position; volatile unsigned result; /* Repeat the operation for a large number of values. */ for (number = 1; number <= max; ++number) { /* Compute the position of the most significant set bit using the bsrl assembly instruction. */ asm ("bsrl %1, %0" : "=r" (position) : "r" (number)); result = position; } printf("Most significant 1 is at position: %d\n", result); return 0; } Will find most significant bit of 100,000,000 numbers but only print the msb of the last int main (int argc, char* argv[]) { int max = atoi (argv[1]); int i, number; unsigned position; volatile unsigned result; /* Repeat the operation for a large number of values. */ for (number = 1; number <= max; ++number) {

/* Repeatedly shift the number to the right, * until the result is zero. Keep count of the number of * shifts this requires. */ for (i = (number >> 1), position = 0; i != 0; ++position) i >>= 1; /* The position of the most significant set bit is the number of shifts we needed after the first one. */ result = position; } printf("most significant bit position is: %d\n", result); return 0; } 7 Ithaca College bsr: bit scan reverse Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source

operand. If the content source operand is 0, the content of the destination operand is undefined. 8 Ithaca College Timing this barr/Student/comp210/resources/assem/findMSB.c barr/Student/comp210/resources/assem/findMSB2.c The command time will time the program that you run [email protected]: time msb 100000000 most significant bit position is: 26 real 0m1.132s user 0m1.124s sys 0m0.004s Program using shift Both programs compiled with optimization level O1 [email protected]: time msb2 100000000 Most significant 1 is at position: 26 real 0m0.115s

user 0m0.112s sys 0m0.000s Program using inline assembly: almost 10 times faster with optimization level O7 first program still took 1.06s! 9 Ithaca College inline assembly code* /* put this line in your C program*/ asm("assembly code"); /* alternative syntax */ __asm__ ("assembly code"); Example: mov instruction asm("movl %rbx, %rax"); /* moves the contents of ebx register to eax */ __asm__("movb %ch, (%rbx)"); /* moves the byte from ch to the memory pointed by ebx */ * see http://www.codeproject.com/Articles/15971/Using-Inline-Assembly-in-C-C 10 Ithaca College

inline assembly More sophisticated assembly For more than one assembly instruction, use semicolon at the end of each instruction of inserted code see example on next slide 11 Ithaca College Example 0 #include int main() { /* Add 10 and 20 and store result into register %eax */ __asm__ ( "mov $10, %rax;" "mov $20, %rbx;" "add %rbx, %rax;" ); compile with O1 flag trace in gdb to see registers change. /* Subtract 20 from 10 and store result into register %eax */ __asm__ ( "mov $10, %rax;" "mov $20, %rbx;" "sub %rbx, %rax;" );

/* Multiply 10 and 20 and store result into register %eax */ __asm__ ( "mov $10, %rax;" "mov $20, %rbx;" "imul %rbx, %rax;" ); return 0 ; } See /home/barr/Student/com p210/resources/assem/i nline1.c 12 Ithaca College Extended inline assembly Idea In extended assembly, we can also specify the operands can specify the input registers, output registers and a list of clobbered registers. asm ( "assembly code" : output operands /* optional */ : input operands /* optional */ : list of clobbered registers /* optional */

); If there are no output operands but there are input operands, we must place two consecutive colons surrounding the place where the output operands would go. Can omit list of clobbered registers to use, GCC and GCCs optimization scheme will take care of the reg. In general, its a bad idea to omit these 13 Ithaca College Example 1 Assm instr Output operands asm ("movq %%rax, %0;" : "=r" ( val )); The variable "val" is kept in a register val is a C variable that must be declared earlier in the C program

the value in register %rax is copied onto that register, and the value of "val" is updated into the memory from that register. note that rax is preceded by 2 percent signs differentiate from a asm parameter (asm works like printf) see ~barr/Student/Comp210/Resources/assem directory for all examples asm ( "assembly code" : output operands /* optional */ : input operands /* optional */ : list of clobbered registers /* optional */ ); 14 Ithaca College Example 1 (continued) asm ("movq %%rax, %0;" : "=r" ( val )); the %0 indicates the first operand of asm, it is associated with the first parameter, i.e., val (similar to printf) =r indicates a register constraint see chart on next page for all possible register specifications r indicates that gcc may keep the variable in any available

General Purpose Register (see next slide) the = indicates write only mode. If our instruction can alter the condition code register, we have to add "cc" to the list of clobbered registers. "g" : Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. 15 Ithaca College register specifiers r register R General register (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP) q General register for data (RAX, RBX, RCX, RDX)

f Floating point reg a %rax,%eax, %ax, %al b %rbx, %ebx, %bx, %bl c %rcx, %ecx, %cx, %cl d %rdx, %edx, %dx, %dl S %rsi, %esi, %si D %rdi, %edi, %di 16

Ithaca College Example 1 continued long no = 100, val ; asm ("movq %1, %%rbx;" "movq %%rbx, %0;" : "=r" ( val ) /* output */ : "r" ( no ) /* input */ : %rbx" /* clobbered register */ ); Two variables are declared in the C code, no and val %0 is the first operand to asm and thus refers to the C variable val (the output variable) %1 is the second operand and thus refers to the C variable no (the input variable)

=r and r say that gcc can use any registers to store the corresponding variable (either val or no ) the clobbered variable is %rbx so gcc should not use that variable anywhere else. See home/barr/Student/comp210/resources/assem/inline1.c 17 Ithaca College Example 2 No variables are declared in the C code Arithmetic performed with registers No values are moved back into C variables. So the result is never used/printed. __asm__ ( "movq $10, %rax;" "movq $20, %rbx;" "addq %rbx, %rax;" );

/* Subtract 20 from 10 and store result into register %eax */ __asm__ ( "movq $10, %rax;" "movq $20, %rbx;" "subq %rbx, %rax;" ); /* Multiply 10 and 20 and store result into register %eax */ __asm__ ( "movq $10, %rax;" "movq $20, %rbx;" "imulq %rbx, %rax;" ); See home/barr/Student/Comp210/Resources/assem/inline2.c 18 Ithaca College Example 3 long arg1, arg2, add ;

arg1 = 10; arg2 = 25; __asm__ ( "addq %%rbx, %%rax;" : "=a" (add) /* output */ : "a" (arg1), "b" (arg2) ); /* input */ The C code declares three variables: arg1, arg2, add The input variables will use %rax (for arg1) and %rbx (for arg2) the output variable is add and will use register %rax Note that dont need the position parameters (%0, %1) if specify registers for the input/output variables. no clobber register is set; gcc can determine See home/barr/Student/comp210/resources/assem/inline3.c 19 Ithaca College #include Example 4 int main() { long arg1, arg2, add, sub, mul, quo, rem ; Example 4 printf( "Enter two integer numbers : " ); scanf( "%ld%ld", &arg1, &arg2 ); /* Perform Addition, Subtraction, Multiplication & Division */

__asm__ ( "addq %%rbx, %%rax;" : "=a" (add) : "a" (arg1) , "b" (arg2) ); __asm__ ( "subq %%rbx, %%rax;" : "=a" (sub) : "a" (arg1) , "b" (arg2) ); __asm__ ( "imulq %%rbx, %%rax;" : "=a" (mul) : "a" (arg1) , "b" (arg2) ); __asm__ ( "movq $0x0, %%rdx;" "movq %2, %%rax; "idivq %%rbx;" : "=a" (quo), "=d" (rem) : "g" (arg1), "g" (arg2) ); printf( "%ld + %ld = %ld\n", arg1, arg2, add ); printf( "%ld - %ld = %ld\n", arg1, arg2, sub ); printf( "%ld * %ld = %ld\n", arg1, arg2, mul ); printf( "%ld / %ld = %ld\n", arg1, arg2, quo ); printf( "%ld %% %ld = %ld\n", arg1, arg2, rem ); return 0 ; idivq S # Signed divide R[%rdx] R[%rdx]:R[%rax] mod S; R[%rax] R[%rdx]:R[%rax] / S } See home/barr/Student/comp210/resources/assem/inline2.c 20 Ithaca College #include Example 4 int main() { long arg1, arg2, arg3, rslt;

In-class exercise printf( "Enter three integer numbers : " ); scanf( "%ld%ld%ld", &arg1, &arg2, &arg3 ); /* Change this line to assembly */ rslt = arg1 + 2 * arg2 - arg3 / 2; printf( "%ld + 2 * %ld %ld / 2 = %ld\n", arg1, arg2, arg3, rslt ); return 0 ; } idivq S # Signed divide R[%rdx] R[%rdx]:R[%rax] mod S; R[%rax] R[%rdx]:R[%rax] / S 21 Ithaca College #include #include Example revisited Example 4 why use inline assembly? no assembly in this code See :

int main (int argc, char* argv[]) { long max = atoi (argv[1]); long number; long i; /home/barr/Student/comp210 unsigned position; volatile unsigned result; /resources/assem/findMSB.c /* Repeat the operation for a large number of values. */ for (number = 1; number <= max; ++number) { /* Repeatedly shift the number to the right, until the result is zero. Keep count of the number of shifts this requires. */ for (i = (number >> 1), position = 0; i != 0; ++position) i >>= 1; /* The position of the most significant set bit is the number of shifts we needed after the first one. */ result = position; } // end outer for loop % gcc -O2 o msb findMSB.c return 0; } // end main % time ./msb 250000000 most significant bit position is: 26 real 0m1.132s user 0m1.124s sys 0m0.004s 22 Ithaca College

#include #include Example revisited Example 4 int main (int argc, char* argv[]) { long max = atoi (argv[1]); long number; unsigned position; volatile unsigned result; why use inline assembly? assembly used for inner loop See: /home/barr/Student/ Comp210/Resources/assem/ findMSB2.c /* Repeat the operation for a large number of values. */ for (number = 1; number <= max; ++number) { /* Compute the position of the most significant set bit using the bsrl assembly instruction. */ asm (bsrl %1, %0 : =r (position) : r (number)); result = position; } // end for loop return 0; } Compare to the 0m1.124s user of the previous C only example!

bsr: Scans source operand for first bit set. Sets ZF if a bit is found set and loads the destination with an index to first set bit. Clears ZF is no bits are found set. BSF scans forward across bit pattern (0-n) while BSR scans in reverse (n-0). %gcc -O2 o msb2 findMSB2.c % time ./msb2 250000000 Most significant 1 is at position: 26 real 0m0.115s user 0m0.112s sys 0m0.000s 23 Ithaca College Volitile If our assembly statement must execute where we put it, (e.g. must not be moved out of a loop as an optimization), put the keyword "volatile" or "__volatile__" after "asm" or "__asm__" and before the ()s. asm volatile ( "...; "...;" : ... ); __asm__ __volatile__ ( "...;" "...;" : ... ); 24

Ithaca College #include Example 4 Example 5 long gcd( long a, long b ) { long result ; Compute GCD with /* Compute Greatest Common Divisor using Euclid's Algorithm */ __asm__ __volatile__ ( "movq %1, %%rax;" Euclids Algm "movq %2, %%rbx;" gcd.c "CONTD: cmpq $0, %%rbx;" "je DONE;" Note that we can put "xorq %%rdx, %%rdx;" "idivq %%rbx;" labels and jump to "movq %%rbx, %%rax;" them! "movq %%rdx, %%rbx;" "jmp CONTD;" "DONE: movq %%rax, %0;" : "=g" (result) : "g" (a), "g" (b) ); Note that we can put several asm return result ;

} instructions in one __asm__ function call int main() { long first, second ; printf( "Enter two integers : " ) ; scanf( "%ld%ld", &first, &second ); printf( "GCD of %ld & %ld is %ld\n", first, second, gcd(first, second) ) ; return 0 ; Do NOT use the optimization parameter when you compile this! } 25 Ithaca College #include int main() { int x, y, rslt, rtnval; int i; printf("Enter two integers\n"); rslt = scanf("%d%d", &x, &y); In-class exercise 2 convert the for loop to assembly code (should be no more than 11 lines of assembly code) /* change into assembly code */ rtnval = 0;

for(i = 0; i < y; i++) { rtnval += x; } /* end change */ printf("%d * %d = %d\n", x, y, rtnval); return 0; } 26

Recently Viewed Presentations

  • A Model for Open Pluggable Edge Services

    A Model for Open Pluggable Edge Services

    Open Pluggable Edge Services OPES Abbie Barbir, Ph.D. [email protected] Summary Presents an overview of OPES model and architecture Core Model Elements are OPES Intermediary OPES Admin Server Remote Call-out Server Introduce Content Services Overlay Networks Current Issues of OPES in...
  • justchinajade.files.wordpress.com

    justchinajade.files.wordpress.com

    Evaluation. When evaluating the effectiveness of the campaign, the Cherokee Rose Writing Project needs to compare: Message Exposure (via media placement): Track the increase of activity on their social media posts.
  • Chapter 14- Genetics

    Chapter 14- Genetics

    Dd Dd dd Dd Dd Dd dd. Female Male. Deaf. Hearing. Figure 9.8 B. Female. Male. Mating. Offspring . Pedigree Key. Human Traits. Recessively Inherited Disorders- Can be mild (albinism) to severe (cystic fibrosis) - A person with a heterozygous...
  • School of Engineering

    School of Engineering

    * Incremental materialization After the training of our network on the DBpedia+BBC ground truth, we use the network to materialize the LUBM graph. LUBM: Benchmark ontology describing the academic world concepts and relations. Algorithm: Running a SPARQL query to collect...
  • Animal Behavior - Guilford County Schools

    Animal Behavior - Guilford County Schools

    Behavioral Cycles Many animals respond to periodic changes in the environment with daily or seasonal cycles of behavior. Circadian rhythm - 24 Hr. cycle of behavior
  • Virginia High School 2009 Mathematics Standards of Learning ...

    Virginia High School 2009 Mathematics Standards of Learning ...

    2009 MATHEMATICS STANDARDS OF LEARNING TRAINING INSTITUTES GEOMETRY Virginia Department of Education Major Changes Content moved to grades 6-8 Complementary, supplementary, and vertical angles Content added Equation of circle given coordinates of center and point on circle Reorganization of reporting...
  • Universal Cell Phone Sound Amplifier

    Universal Cell Phone Sound Amplifier

    Take 10 minutes on your own to sketch 3 or more conceptual drawings of a universal cell phone sound amplifier. You will be sharing these with your team members, so please do not discuss during the time given for your...
  • O Originally the Catholic Church told people that

    O Originally the Catholic Church told people that

    Included kidnapping of Canadian diplomats by members of the FLQ. British Trade Commissioner Kidnapped on October 5, 1970 by the "Liberation Cell" of the FLQ Kidnapping left police with list of demands 1. End of police searchs 2. Publication of...