Session 3: Design Tradeoff Analysis
Characterization, Tracing, and Optimization of Commercial I/O Workloads H. Huang, M. Teshome, J. Casmira and D.R. Kaeli Northeastern University Computer Architecture Research Laboratory Brian Garrett and William Zahavi EMC Corporation Trace-driven Performance Exploration of a PowerPC 601 OLTP Workload on Wide Superscalar Processors J.H. Moreno, M. Moudgill, J.D. Wellman, P.Bose, L. Trevillyan IBM T.J. Watson Research Center Performance Analysis of Shadow Directory Prefetching for TPC-C Dan Friendly – University of Michigan Mark Charney – IBM Research Evaluating Branch Prediction Methods for an S390 Processor using Traces from Commercial Application Workloads Rolf B. Hilgendorf, IBM Entwicklung GmbH, Boeblingen, Germany Gerald J. Heim, Wilhelm Schichard-Institut fuer Informatik, Universitaet Tuebingen, Tuebingen, Germany
Characterization, Tracing and Optimization of Commercial I/O Workloads
David Kaeli Northeastern University Computer Architecture Research Laboratory Boston, MA [email protected]
Participants in this project
NUCAR
Hua Huang Melaku Teshome Jason Casmira
EMC Corporation
Brian Garrett William Zahavi
NUCAR
NUCAR
Overview
Introduction ICDA architecture and performance I/O tracing and workload characterization I/O caching Performance modeling and results Summary and future work
Introduction
Commercial applications (e.g., transaction processing) have become heavily dependent on high speed, reliable, mass storage (TB – PB) Integrated Cached Disk Arrays (EMC Symmetrix) have become the design of choice to supply high-end storage needs for business
The Research paper on Creating a Student Portal for Bataan Peninsula State University
IntroductioN The project is to create a Student Portal for Bataan Peninsula State University. Its purpose is to raise the availability of certain students’ records like the grades, finances/billing, and curriculum checklist. Along with, is the process of encoding students’ final grades and the forming of the dean’s list. Using the student portal, data can be accessible at any time and location ...
NUCAR
NUCAR
ICDA Architecture
E M C IC D A C ache
ICDA Architectures
Fault tolerant and recoverable
AA A AAAA AA A AAAA AA A AAAA
RAID Battery backed up cache Dual redundant controllers Hot pluggable disks and controllers
R A ID D is k s
High performance and scaleable
Large amount of cache memory (4 GB) Aggressive caching schemes Multi-host connect Partitionable
NUCAR
NUCAR
ICDA Performance
Cache architecture
Aggressive prefetching Fully associative organization Complete I/O space directory
I/O Tracing
Options
Host-based tracing (e.g., TNF on Solaris) ICDA resident tracing (e.g., EMC Symmetrix) SCSI and Fiberchannel tracing (e.g., I-Tech, Ancott)
Our work investigates improved cache organization and prefetching to improve I/O response time We begin by capturing reference pattern traces of I/O workloads NUCAR
We currently use an SCSI-bus sniffer to capture I/O traces
Non-intrusive Ability to capture multiple I/O streams Developed a complete set of post processing tools
NUCAR
SCSI Bus Tracing
I/O Reference Trace Characteristics
Variable spatial locality Long range temporal locality Reasons
SCSI Probe
Main memory acts as a filter Spatial locality captured in a page Majority of the references are for data (versus instructions)
NUCAR
NUCAR
I/O Caching
Currently commercial ICDA designs employ a fully associative LIFO or LRU replacement and a full I/O space directory Provides very fast lookup, but this implementation is very expensive Instead we want to reduce this cost by using:
a set-associate lookup an inverted page table organization efficient linked structure for performing lookup
Inverted Page Table
Logical Block Address
TAG
INDEX
Hash Table
next prev
*
The Term Paper on Informatica: What are the kinds of lookup?
You can configure the Lookup transformation to perform different types of lookups. You can configure the transformation to be connected or unconnected, cached or uncached: Connected or unconnected. Connected and unconnected transformations receive input and send output in different ways. Cached or uncached. Sometimes you can improve session performance by caching the lookup table. If you cache the ...
*
*
data *
+
NUCAR NUCAR
I/O Caching
Existing LUT strategies
table size is proportional to the size of the addressable disk space
Lookup Table Size Growth
Lookup table size (MB) 10 IPT 8 6 4 2 0 Full LUT
Linked-List Lookup Structure
uses an inverted page table structure worst case lookup time is bound by the number of hash classes and chain length table size is proportional to the size of the cache have also investigated tree-based structures
1
2
4 I/O cache size (GB)
8
16
64GB I/O address space 64 512B blocks per cache line
NUCAR
NUCAR
Cache Prefetching
Detect sequential access Miss prefetching
prefetch next sequential block on a miss
Prefetch Buffer Filter
Small, fully associative, table to act as both a stream buffer and a victim cache Increases the associativity of hot cache lines Uses a Least Recently Prefetched replacement algorithm
Tagged prefetching
miss prefetching + prefetching on first reference to prefetched lines
Prefetching can introduce cache pollution
NUCAR
NUCAR
Workloads
DSSLOAD
Decision Support System Loading Restoring a large database system Highly sequential behavior (approx. 70%) Approximately 50% reads, 50% writes Fixed I/O sizes (32 blocks)
Workloads
OLTP
On-Line Transaction Processing Point of sale transactions Smaller amount of sequential behavior (approx. 30%) Smaller and variable I/O sizes (avg. 8 blocks) Approximately 75% reads, 25% writes
DSSQUERY
Decision Support System Query Querying a large database system Highly sequential behavior (approx. 70%) All read accesses Fixed I/O sizes (32 blocks)
NUCAR
NUCAR
DSSLOAD
hit rate 56.4 56.2 56 55.8 55.6 55.4 55.2
DSSQUERY
hit rate 57.8 57.6 57.4 57.2 57 56.8 56.6 56.4
1
2
4
The Term Paper on How Does The Drop Height Of The Marble Affect The Size Of The Crater In Sand?
Impact craters are geologic structures formed when a large meteorite, asteroid or comet smashes in to a planet or a satellite. Meteorites are small rocks in space that hit the earth’s atmosphere at a high velocity. Throughout their history they have heavily bombarded all the inner bodies in our solar system. In this experiment we will use marbles as our meteorites, these will be free falling ...
8 cache size (MB) Full n=1 n=2
16
32
64
1
2
4
8 cache size (MB) Full n=1 n=2
16
32
64
n = length of hash chain 64 block line size
n = length of hash chain 64 block line size
NUCAR
NUCAR
OLTP
hit rate 46.4 46.2 46 45.8 45.6 45.4 1 2 4 8 cache size (MB) Full n=1 n=2
n = length of hash chain 64 block line size
DSSLOAD
hit rate 100 90 80 70 60 50 40 128
16 32 64
256
512 cache size (MB) none miss tagged pbf
1024
2048
32 block line size direct mapped 4 entry PBF
NUCAR
NUCAR
DSSQUERY
hit rate 90 80 70 60 50 40 30 128
OLTP
hit rate 70 65 60 55 50 45 40 35 30 128
256
512 cache size (MB) none miss tagged pbf
1024
2048
256
512 cache size (MB) none miss tagged pbf
1024
2048
32 block line size direct mapped 4 entry PBF
32 block line size direct mapped 4 entry PBF
NUCAR
NUCAR
Disk Traffic
Disk Requests (in billions) 4 prefetches misses 3 2 1 0
Summary
I/O commercial workloads present a different set of characteristics than those present in processor-based workloads IPT cache organizations can provide similar hit rates to full table implementations and can reduce memory requirements Prefetching and filtering can greatly enhance low associativity cache organizations
DSSLOAD-tagged DSSQUERY-tagged OLTP-tagged DSSLOAD-pbf DSSQUERY-pbf OLTP-pbf 512MB cache 32 block line size 4 entry PBF
NUCAR
Future Directions
Cycle-based ICDA model Adaptive prefetching schemes Multi-stream caching COTS NOWs to perform multimedia caching
6XSHUVFDODU SHUIRUPDQFH H[SORUDWLRQ
7UDFHGULYHQ SHUIRUPDQFH H[SORUDWLRQ RI D 3RZHU3& 7UDFHGULYHQ SHUIRUPDQFH H[SORUDWLRQ RI D 3RZHU3& 2/73 ZRUNORDG RQ ZLGH VXSHUVFDODU SURFHVVRUV 2/73 ZRUNORDG RQ ZLGH VXSHUVFDODU SURFHVVRUV
The Essay on History Of The Periodic Table
Things are very different from each other, and can be broken down into small groups inside itself, which was then noticed early by people, and Greek thinkers, about 400 BC. Which just happened to use words like 'element', and 'atom' to describe the many different parts and even the smallest parts of matter. These ideas were around for over 2000 years while ideas such as 'Elements' of Earth, Fire, ...
– + 0RUHQR 0 0RXGJLOO -‘ :HOOPDQ 3%RVH / 7UHYLOODQ ,%0 7KRPDV – :DWVRQ 5HVHDUFK &HQWHU 12 2 – 3 4 – 8 8 -15 16 Executions – 31 – 63 -127 # static 3402 3663 2402 1743 1532 1323 981 44 Branches
R.H. G.H. 5
7065
2402
6024
Specifics of ESA/390 ISA
17 different Branch Instructions Majority of the Branches in the Traces are coded as Indirect Branches Target Address taken from a General Purpose Register Target Address generated by Address Calculation Conditional Branch may be coded as to Branch-Always Unconditional Branch may be coded as Not-To-Branch No explicit Call- and/or Return- Instructions
R.H. G.H. 6
Prefetch Prediction (Principle)
Instruction Cache
Prefetch Address
Register I-Buffer
Corrected Address
+
Branch Target Buffer
Sequential Address Predicted Branch Target Address
Execution Unit
Branch Prediction Error
R.H. G.H. 7
Influence of BTB Size
20
T1 Table Misses in % total Branches T2
15
T3
10
T4
5
0
R.H. G.H. 8
2-K
4-K
8-K
16-K
32-K
64-K
Nr. of Entries
Influence of Priming on Table Misses
20
Table Misses in % of total Branches
T2 (part) T2 (part) with priming
15
10
5
0
2-K
4-K
8-K
16-K
32-K
64-K
Nr. of Entries
R.H. G.H. 9
Local History: Static Predictions
Direction Misses in % of total Branches
5 4.5 4 3.5 3
T1
2-bit History
T2
3-bit History
T3
2-bit Saturation Counter
T4
R.H. G.H. 10
Local History with Adaptive Assignment (Principle)
Prefetch Address Local History Table
Pattern Table
History Pattern for that Branch
Counter Value for that Pattern
Predicted Direction
R.H. G.H. 11
Local History with Adaptive Pattern Table
Direction Misses in % of total Branches
5
1-bit history 2-bit history
4.5
3-bit history 4-bit history
The Essay on History Of The Periodic Table 2
Explain how scientific observations led to the development of, and changes to, the periodic table. -Dmitri Mendeleev- first periodic table, organized 63 known elements according to properties, organized into rows and columns and wrote name, mass, and chemical properties on each -Julius Lothar Meyer- independently worked in Germany, similar to Mendeleev -Henry Gwyn Jeffreys Moseley- Worked with ...
4
5-bit history
3.5
6-bit history 2-bit counter
3 0
R.H. G.H. 12
3
6
9
12
Nr. of Instruction Address-bits used
Local History with Adaptive Pattern Table
Direction Misses in % of total Branches
4
3.5
3
T1
Local 2 bit
R.H. G.H. 13
T2
1Hist 11Add bit
T3
2 Hist 11 Add bit
T4
4 Hist 10 Add bit
Using Global History to Access the Pattern Table (Principle)
update History Prefetch Address Columns equal Branches per Fetch Group Pattern Table
XOR
Counter Value for that Pattern
Global History Register Global Path History Register
R.H. G.H. 14
“gshare” “gpshare”
Predicted Direction
Global History Pattern “gshare”
4.5
Direction Misses in % of total Branches 8 K -Table 16 K -Table
4
32 K -Table Local 2bit
3.5
3
12
R.H. G.H. 15
10
8
6
4
2
0
Nr. of History-bits used
Comparing Global- and Path History
Direction Misses in % of total Branches
4
3.5
3
T1
Local 2bit
R.H. G.H. 16
T2
16 K -Table “gshare”
T3
16 K -Table “gpshare” 32 K -Table “gshare”
T4
32 K -Table “gpshare”
Local adaptive
Hybrid Predictor (Principle)
Prefetch Address
update History
selected pattern Counter
BTB
0x1110
XOR
Selection Counter Local History Counter
Global Path History Register
Predicted Direction
R.H. G.H. 17
Hybrid Predictor
Direction Misses in % of total Branches
4
3.5
3
2.5
Local 2bit
R.H. G.H. 18
T1
Local adaptive
T2
“best” gpshare
T3
Hybrid (gpshare) (Local 2bit)
T4
Hybrid (gpshare) (Local adaptive)
Hybrid Predictor using Multiple Predictors
Direction Misses in % of total Branches
The Term Paper on Study Guide For European History Or Global Studies
study guide for European History or Global Studies 1. Petrarch.- Called the "Father of all Humanism." Revered Roman Cicero above all others. Followed Cicero's example of elequence and put emphasis upon language such as Latin and Greek. 2. Medici.- Wealthy banking family controlling Florence. Had much influence in government and influenced The Signoria, the ruling council in Florence. Created a ...
3.25 3 2.75 2.5 2.25 T1
Hybrid (Local 2 bit) Path-Global-Local 2-bit
T2
T3
T4
Global-Local-Local
Path-Local-Local Path-Global-Local adaptive
R.H. G.H. 19
Changing Target Treatment by Adding Cache
Address Misses in % of total Branches
3.5 3 2.5 2 1.5 1 0.5 T1
no Special Treatment For Changing Targets
T2
Return Cache
T3
T4
Moving Target Cache Multiple Pattern
R.H. G.H. 20
Conclusion
The Branch Target Buffer should be as large as possible Local Adaptive History does not improve Prediction Correctness enough to pay for the Hardware added Modest 2 Way Hybrid Predictor seems to have the optimal Cost Performance To add Small Return- and Moving-TargetCaches improves the Prediction Correctness significantly
R.H. G.H. 21