Sunday, October 14, 2007

Oracle Architecture 3

What Affects Oracle Performance?
Because one of the roles of the DBA is to anticipate, find, and fix performance problems, you must know what types of things affect performance. To understand why these things affect performance, you must first review the basics of how a computer system works.
Overview of Computer Architecture
Your computer system consists of thousands of individual components that work in harmony to process data. Each of these components has its own job to perform, and each has its own performance characteristics.
The brainpower of the system is the Central Processing Unit (CPU), which processes all the calculations and instructions that run on the computer. The job of the rest of the system is to keep the CPU busy with instructions to process. A well-tuned system runs at maximum performance if the CPU or CPUs are busy 100% of the time.
So how does the system keep the CPUs busy? In general, the system consists of different layers, or tiers, of progressively slower components. Because faster components are typically the most expensive, you must perform a balancing act between speed and cost efficiency.
CPU and Cache
New Term:The CPU and the CPU's cache are the fastest components of the system. The cache is high-speed memory used to store recently used data and instructions so that it can provide quick access if this data is used again in a short time. Most CPU hardware designs have a cache built into the CPU chip. This internal cache is known as a Level 1 (or L1) cache. Typically, an L1 cache is quite small--8-16KB.
When a certain piece of data is wanted, the hardware looks first in the L1 cache. If the data is there, it's processed immediately. If the data is not available in the L1 cache, the hardware looks in the L2 cache, which is external to the CPU chip but located close to it. The L2 cache is connected to the CPU chip(s) on the same side of the memory bus as the CPU. To get to main memory, you must use the memory bus, which affects the speed of the memory access.
Although the L2 cache is twice as slow as the L1 cache, it's usually much larger. Its larger size means you have a better chance of getting a cache hit. Typical L2 caches range in size from 128KB to 4MB.
Slower yet is the speed of the system memory--it's probably five times slower than the L2 cache. The size of system memory can range from 4MB for a small desktop PC to 2-4GB for large server machines. Some supercomputers have even more system memory than that.
As you can see from the timeline shown in Figure 2.4, there is an enormous difference between retrieving data from the L1 cache and retrieving data from the disk. This is why you spend so much time trying to take advantage of the SGA in memory. This is also why hardware vendors spend so much time designing CPU caches and fast memory buses..
CPU Design
Most instruction processing occurs in the CPU. Although certain intelligent devices, such as disk controllers, can process some instructions, the instructions these devices can handle are limited to the control of data moving to and from the devices. The CPU works from the system clock and executes instructions based on clock signals. The clock rate and type of CPU determine how quickly these instructions are executed.
The CPU usually falls into one of two groups of processors: Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC).
CISC Processors
CISC processors (like the ones Intel builds) are by far the most popular processors. They are more traditional and offer a large instruction set to the program developer. Some of these instructions can be quite complicated; most instructions require several clock cycles to complete.
CISC processors are complex and difficult to build. Because these chips contain millions of internal components, the components are extremely close together. The physical closeness causes problems because there is no room for error. Each year, technology allows more complex and faster chips to be built, but eventually, physics will limit what can be done.
CISC processors carry out a wide range of tasks and can sometimes perform two or more instructions at a time in parallel. CISC processors perform most tasks, such as RDBMS processing, very well.
RISC Processors
RISC processors are based on the principle that if you can reduce the number of instructions processed by the CPU, the CPU can be simpler to build and can run faster. By putting fewer internal components inside the chip, the speed of the chip can be accelerated. One of the most popular RISC chips on the market is the DEC Alpha.
The system compiler determines what instructions are executed on the CPU chips. When the number of instructions was reduced, compilers were written to exploit this and to compensate for the missing instructions.
By reducing the instruction set, RISC manufacturers have been able to increase the clock speed to many times that of CISC chips. Although the faster clock speed is beneficial in some cases, it offers little improvement in others. One effect of a faster CPU is that the surrounding components such as L2 cache and memory must also run faster at an increase in cost.
One goal of some RISC manufacturers is to design the chip so that the majority of instructions complete within one clock cycle. Some RISC chips can already do this. But because some operations that require a single instruction for a CISC chip might require many instructions for a RISC chip, a speed-to-speed comparison cannot be made.
CISC versus RISCBoth CISC and RISC processors have their advantages and disadvantages; it's up to you to determine whether a RISC processor or a CISC processor will work best for you. When comparing the two types of processors, be sure to look at performance data and not just clock speed. Although the RISC chips have a much faster clock speed, they do less work per instruction. The performance of the system cannot be determined by clock speed alone.
Multiprocessor Systems
Multiprocessor systems can provide significant performance with very good value. With such a system, you can start with one or two processors and add more as needed. Multiprocessors fall into several categories; two of the main types of multiprocessor systems are the Symmetric Multiprocessor (SMP) system and the Massively Parallel Processing (MPP) system.
SMP Systems
SMP systems usually consist of a standard computer architecture with two or more CPUs that share the system memory, I/O bus, and disks. The CPUs are called symmetric because each processor is identical to any other processor in terms of function. Because the processors share system memory, each processor looks at the same data and the same operating system. In fact, the SMP architecture is sometimes called tightly coupled because the CPUs can even share the operating system.
* In the typical SMP system, only one copy of the operating system runs. Each processor works independently by taking the next available job. Because the Oracle architecture is based on many processes working independently, you can see great improvement by adding processors.
* The SMP system has these advantages:
* It's cost effective--The addition of a CPU or CPU board is much less expensive than adding another entire system.
* It's high performing--Under most applications, additional CPUs provide an incremental performance improvement.
* It's easily upgradable--Simply add a CPU to the system to instantly and significantly increase performance.
* A typical SMP system supports between four and eight CPUs. Because the SMP system shares the system bus and memory, only a certain amount of activity can occur before the bandwidth of the bus is saturated. To add more processors, you must go to an MPP architecture.
MPP Systems
MPP systems are based on many independent units. Each processor in an MPP system typically has its own resources (such as its own local memory and I/O system). Each processor in an MPP system runs an independent copy of the operating system and its own independent copy of Oracle. An MPP system is sometimes called loosely coupled.
Think of an MPP system as a large cluster of independent units that communicate through a high-speed interconnect. As with SMP systems, you will eventually hit the bandwidth limitations of the interconnect as you add processors. However, the number of processors with which you hit this limit is typically much larger than with SMP systems.
If you can divide the application among the nodes in the cluster, MPP systems can achieve quite high scalability. Although MPP systems can achieve much higher performance than SMP systems, they are less economical: MPP systems are typically much higher in cost than SMP systems.
CPU Cache
Regardless of whether you use a single-processor system, an SMP system, or an MPP system, the basic architecture of the CPUs is similar. In fact, you can find the same Intel processors in both SMP and MPP systems.
As you learned earlier today, the system cache is important to the system. The cache allows quick access to recently used instructions or data. A cache is always used to store and retrieve data more quickly than the next level of storage (the L1 cache is faster than the L2 cache, the L2 cache is faster than main memory, and so on).
By caching frequently used instructions and data, you increase the likelihood of a cache hit. This can save precious clock cycles that would otherwise have been spent retrieving data from memory or disk.

No comments: