-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathPAPI_FAQ.html
721 lines (693 loc) · 52 KB
/
PAPI_FAQ.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<head>
<title>PAPI</title>
</head>
<body bgcolor="#FFFFFF" text="#000000">
<a name="top"></a>
<center>
<table width="75%">
<tr>
<td>
<center><h2>PAPI FAQ</h2></center><strong><a href="#9">General Questions (FAQ)</a></strong><br />
<a href="#88">I have a question that I think should be added here. Where should I send it?</a><br />
<a href="#166">How do I install the PAPI library?</a><br />
<a href="#167">Where do I go for help?</a><br />
<a href="#70">What are the mailing lists and how do I subscribe?</a><br />
<a href="#165">Where are the archives for the mailing lists?</a><br />
<a href="#80">What is needed to use PAPI?</a><br />
<a href="#81">What tools are available for PAPI?</a><br />
<br />
<strong><a href="#38">The PAPI Library</a></strong><br />
<a href="#218">I downloaded the PAPI 3 tarball last week and keep getting a segmentation fault in gcc. What's up?</a><br />
<a href="#219">When I make PAPI, I always get a warning message when compiling fmultiplex2. Why?</a><br />
<a href="#138">How do I convert my code from PAPI 2 to PAPI 3?</a><br />
<a href="#168">How do I compile PAPI with debugging support?</a><br />
<a href="#169">How do I use the debugging features of the PAPI library?</a><br />
<a href="#72">Why does PAPI_overflow, PAPI_profil and PAPI_sprofil work strangely with a small threshold?</a><br />
<a href="#71">How do I stop PAPI_overflow, PAPI_profile or PAPI_sprofil?</a><br />
<a href="#85">What events does PAPI track?</a><br />
<a href="#83">How does PAPI handle threads?</a><br />
<a href="#84">How does PAPI handle fork/exec?</a><br />
<a href="#78">Does PAPI support unbound or non-kernel threads?</a><br />
<a href="#74">How do I encode a native event?</a><br />
<a href="#82">Why is there more than one patch for Linux?</a><br />
<a href="#77">The numbers are funky for event 0xabc on platform XYZ, help me!</a><br />
<a href="#170">My program runs fine when measuring 1 or 2 events, but when I add more I get a -8, PAPI_ECNFLCT error code. The error text says, "Event exists. but cannot be counted due to hardware resource limitations". What does this mean?</a><br />
<a href="#75">What's multiplexing?</a><br />
<a href="#86">Why am I still getting PAPI_ECNFLCT when using multiplexing?</a><br />
<a href="#171">What's a derived event?</a><br />
<a href="#87">When I compile and run the example program (PAPI_flops.c) on X platform I get the following error message: Error in PAPI_flops: Event exists, but cannot be counted due to hardware resource limits, what is the problem?</a><br />
<a href="#73">Why can't I get my Fortran programs to compile with PAPI on a Cray T3E?</a><br />
<a href="#79">What's wrong with PAPI_LST_INS (hex code 0x43) on my Pentium?</a><br />
<a href="#216">PAPI_create_eventset always returns an error now.</a><br />
<a href="#233">What's this GCC error about "thread local storage not support for this target"?</a><br />
<br />
<strong><a href="#65">The PAPI GIT Source Repository</a></strong><br />
<a href="#325">Can I browse the source repository on the web?</a><br />
<a href="#326">How do I download a copy of the current PAPI source tree?</a><br />
<a href="#327">Can I commit changes to the PAPI repository?</a><br />
<a href="#328">Are there GUI interfaces available for GIT?</a><br />
<a href="#329">Where can I learn more about GIT?</a><br />
<br />
<strong><a href="#40">PAPI on AIX POWER Processors</a></strong><br />
<a href="#177">General Comments</a><br />
<a href="#178">Installation notes</a><br />
<a href="#179">Test case notes</a><br />
<a href="#180">Counter notes</a><br />
<a href="#235">Things go haywire on my Power/AIX box with threaded programs?</a><br />
<br />
<strong><a href="#45">Linux-IA64</a></strong><br />
<a href="#189">Floating Point</a><br />
<a href="#192">Notes on PAPI->Native event mappings</a><br />
<a href="#220">Why am I getting errors from perfmon and PAPI on my Redhat kernels?</a><br />
<a href="#234">Counter interrupts seem to have stopped on my threaded programs?</a><br />
<a href="#264">WHy can't I build PAPI with the Intel icc compiler?</a><br />
<br />
<strong><a href="#46">Linux-Perfctr</a></strong><br />
<a href="#209">PAPI and the Linux Kernel</a><br />
<a href="#210">Before you compile</a><br />
<a href="#211">If you have already patched your kernel</a><br />
<a href="#121">How do I patch my Linux/Pentium I, II, III, IV, AMD K7, K8 box to work with PAPI?</a><br />
<a href="#265">After reboot, the /dev/perfctr file always seems to have the wrong permissions and PAPI fails to initialize. What's going on?</a><br />
<a href="#212">Hardware interrupt driven counters</a><br />
<a href="#259">Why do PAPI_LD_INS and PAPI_SR_INS give identical results on Pentium 4?</a><br />
<a href="#213">Floating point counts on the Pentium 4 series</a><br />
<a href="#214">Vector instruction counts on the Pentium 4 series</a><br />
<a href="#215">The memory test sometimes fails on Athlon Processors.</a><br />
<a href="#262">Floating Point counts on AMD Opteron</a><br />
<br />
<strong><a href="#47">Solaris-Ultra</a></strong><br />
<a href="#193">General Comments</a><br />
<a href="#194">Bugs</a><br />
<a href="#76">My Sun box doesn't have libcpc.h. What should I do?</a><br />
</td>
</tr>
<tr>
<td>
<hr />
</td>
</tr>
</table>
</center>
<center>
<table width="75%">
<tr>
<td>
<a name="9"></a>
<h2>General Questions (FAQ)</h2>
<font color="#666"><p></p></font>
<a name="88"></a>
<strong>I have a question that I think should be added here. Where should I send it?</strong>
<blockquote><p><a href="mailto:[email protected]">[email protected]</a>.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="166"></a>
<strong>How do I install the PAPI library?</strong>
<blockquote><p>Please see INSTALL.txt in the papi root directory.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="167"></a>
<strong>Where do I go for help?</strong>
<blockquote><p>First, read this document thoroughly. Then consult the PAPI Home Page
at <a href="../">http://icl.cs.utk.edu/papi</a>.
If that doesn't help, then search the archives as mentioned below. If
that fails, then send mail to one of the two mailing lists, <a href="mailto:[email protected]">[email protected]</a> or <a href="mailto:[email protected]">[email protected]</a>.
The former is a group for general announcements, questions and
miscellaneous
topics. The latter is is a discussion group for
the developers of PAPI and it receives all CVS update messages. (which
can be a significant amount of mail!)</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="70"></a>
<strong>What are the mailing lists and how do I subscribe?</strong>
<blockquote><p>There are currently two mailing lists, ptools-perfapi, which is a group for general announcements, questions and miscellaneous topics and perfapi-devel, which is a discussion group for the developers of PAPI and it receives all CVS update messages (which can be a significant amount of mail!)</p>
<p>To subscribe to or maintain your subscription to either of the above groups, go to:<br /> <a href="http://lists.eecs.utk.edu/mailman/listinfo/ptools-perfapi">lists.eecs.utk.edu/mailman/listinfo/ptools-perfapi</a> or <a href="http://lists.eecs.utk.edu/mailman/listinfo/perfapi-devel">lists.eecs.utk.edu/mailman/listinfo/perfapi-devel</a>.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="165"></a>
<strong>Where are the archives for the mailing lists?</strong>
<blockquote><p>The archives for the general PAPI mailing list are located at <a href="http://lists.eecs.utk.edu/pipermail/ptools-perfapi/">lists.eecs.utk.edu/pipermail/ptools-perfapi/</a>.
The archives for the developers list are located at <a href="http://lists.eecs.utk.edu/pipermail/perfapi-devel/">lists.eecs.utk.edu/pipermail/perfapi-devel/</a>.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="80"></a>
<strong>What is needed to use PAPI?</strong>
<blockquote><p>See the Platform section at <a
href="http://icl.cs.utk.edu/papi/custom/index.html?lid=62&slid=96">http://icl.cs.utk.edu/papi/custom/index.html?lid=62&slid=96.</a>.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="81"></a>
<strong>What tools are available for PAPI?</strong>
<blockquote><p>SOme of the more popular tools using PAPI can be found under the Tools link on the PAPI web page at <a href="http://icl.cs.utk.edu/papi">http://icl.cs.utk.edu/papi/</a>.
You can also see the latest list of third party tools and related software at <a href="http://icl.cs.utk.edu/papi/links/index.html">http://icl.cs.utk.edu/papi/links/index.html</a>. If you have a tool to be posted, send it to the mailing list.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="38"></a>
<h2>The PAPI Library</h2>
<font color="#666"><p></p></font>
<a name="218"></a>
<strong>I downloaded the PAPI 3 tarball last week and keep getting a segmentation fault in gcc. What's up?</strong>
<blockquote><p>SOme versions of GCC have a bug that is triggered by a statement in PAPI 3.0. This (one character) is fixed in the current tar ball, but may not be in the one you downloaded.</p><p>If you see an INTERNAL ERROR from GCC when compiling multiplex.c, do 2 things.</p><p>1) edit multiplex.c, line 1021 to have 2 equal signs instead of 1.</p><p>2 optional) send a message to your local gcc maintainer and complain.</p><p>The actual culprit is:</p><p>assert(retval = PAPI_OK) and it should be
assert(retval == PAPI_OK)</p><p>Of course, both are legal C and nothing should trigger an internal compiler error, but hey...</p><p>P.S. If your current release compiled with GCC, you're still ok. As the statement above NEVER gets triggered. It is there as an artifact from the original multiplex.c implementation. So you don't need to change or upgrade your PAPI or gcc.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="219"></a>
<strong>When I make PAPI, I always get a warning message when compiling fmultiplex2. Why?</strong>
<blockquote><p>The warning message here is benign, but since it occurs on the last file to be compiled, it often looks like the build has been aborted.
The reason the message occurs is that the compiler thinks it is trying to stuff too many bits into an integer value. You can fix it by rearranging the code a little bit. Or just download the latest copy of fmultiplex2.F from the cvs tree.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="138"></a>
<strong>How do I convert my code from PAPI 2 to PAPI 3?</strong>
<blockquote><p>PAPI 3 represents a major upgrade to the PAPI library.
Because of this, there have been a number of interface changes. The process to upgrade from PAPI 2 to PAPI 3 is straightforward, and documented in the <a href ="http://icl.cs.utk.edu/projects/papi/files/documentation/PAPI_Conversion_Cookbook.htm">PAPI Conversion Cookbook</a>. You can read it online, or<a href="http://icl.cs.utk.edu/papi/custom/index.html?lid=49&slid=79"> download </a>it in a number of different formats.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="168"></a>
<strong>How do I compile PAPI with debugging support?</strong>
<blockquote><p>To compile with debugging, define CFLAGS to include -DDEBUG in the corresponding Makefile or Rules.<platform> file.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="169"></a>
<strong>How do I use the debugging features of the PAPI library?</strong>
<blockquote><p>To enable debugging messages at run time, set the PAPI_DEBUG environment variable to one or more of the following with any character as a separator.</p><p>SUBSTRATE<br>
API<br>
INTERNAL<br>
THREADS<br>
MULTIPLEX<br>
OVERFLOW<br>
PROFILE<br>
ALL</p><p>Also, see the man page for <a href="http://icl.cs.utk.edu/projects/papi/files/html_man3/papi_set_debug.html">PAPI_set_debug()</a>.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="72"></a>
<strong>Why does PAPI_overflow, PAPI_profil and PAPI_sprofil work strangely with a small threshold?</strong>
<blockquote><p>On most systems, overflow must be emulated in software by PAPI. Only on the UltraSparc III, Itanium and IRIX does the operating system support true interrupt on overflow. Therefore the user is advised on most platforms to make sure the overflow value is no more than 1/1000th the clock rate. The emulation handler in PAPI runs every millisecond, therefore the goal of the tool designer should be to pick an value that will overflow frequently but not too frequently. Not following these guidelines could result in either the overflows never occurring or overflows occurring on every interrupt and thus resulting in a flat profile.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="71"></a>
<strong>How do I stop PAPI_overflow, PAPI_profile or PAPI_sprofil?</strong>
<blockquote><p>Call PAPI_stop, and then call PAPI_overflow, PAPI_profile or PAPI_sprofil with a threshold value of 0. Since PAPI 3 can overflow and profile on multiple events, you must call the above routines for EACH event that had been previously enabled for overflow or profile.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="85"></a>
<strong>What events does PAPI track?</strong>
<blockquote><p>PAPI only tracks 'hardware events', the occurrence of signals onboard the microprocessor. It does not count system calls, software interrupts or other software events. The user should remember that by default, PAPI only measures events that occur in User Space.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="83"></a>
<strong>How does PAPI handle threads?</strong>
<blockquote><p>Currently, PAPI only supports thread level measurements with kernel or bound threads. Each thread must create, manipulate and read its own counters. When a thread is created, it inherits no PAPI events or information from the calling thread.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="84"></a>
<strong>How does PAPI handle fork/exec?</strong>
<blockquote><p>When a process is created, it inherits no PAPI information from the calling thread.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="78"></a>
<strong>Does PAPI support unbound or non-kernel threads?</strong>
<blockquote><p>Yes, but the counts will reflect the total events for the process. Measurements done in other threads will all get the same values, namely those counts for the total process. For non-bound threads, it is not necessary to call PAPI_thread_init. But in most scenarios like with SMP or OpenMP compiler directives, bound threads will be the default. For those using Pthreads, the user should take care to set the scope of each thread to the PTHREAD_SCOPE_SYSTEM attribute, unless the system is known to have a non hybrid thread library implementation, like Linux.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="74"></a>
<strong>How do I encode a native event?</strong>
<blockquote><p>In PAPI2.0:
Unless otherwise stated in the FAQ section for your platform, the encoding is as follows:</p><p>event = ((reg_code & 0xffffff) << 8 | (reg_num & 0xff))</p><p>In PAPI3.0:
Just find the native event name and then call PAPI_event_name_to_code. The code returned can be added directly to an event set.
The native events can be listed with the test case 'native_avail' in the ctests directory.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="82"></a>
<strong>Why is there more than one patch for Linux?</strong>
<blockquote><p>There are numerous patches designed to provide access to the Intel CPU performance counters. As PAPI began, we used the original Beowulf patch (perf) by David Hendriks. However, as PAPI progressed, we needed some addition features, which I graciously added. This patch used a system call approach and has proven to be exceedingly stable. Yes, no crashes reported. I knew that there was a better way to designed a performance counter kernel patch, one that used mmap() to provide direct access to the virtual counts. Mikael Pettersson provided me with exactly that in the form of the perfctr patch. It is also very, very stable. It can be found at <a href="http://user.it.uu.se/~mikpe/linux/perfctr">http://user.it.uu.se/~mikpe/linux/perfctr</a>. If you're starting with PAPI for the first time, we recommend the perfctr patch as included in the papi source distribution.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="77"></a>
<strong>The numbers are funky for event 0xabc on platform XYZ, help me!</strong>
<blockquote><p>This is not a question, but I'll help you. We the PAPI developers cannot be experts on the 1000's of events found across all supported platforms. However, if you are using a PAPI preset, the first thing to do is to look up the corresponding native event code using the test case 'avail'. Then the best bet is to always go to the vendor's technical documentation site and check the processor reference manual. If you're convinced everything is kosher, then please feel free to send a message to the mailing list and one of the members may be able to help you.</p><p>My program runs fine with 1 or 2 counters, but when I add more I get a -8, PAPI_ECNFLCT error code. The error text says, "Event exists. but cannot be counted due to hardware resource limitations". What does this mean?</p><p>Many systems have only a few hardware performance counter registers thus you can only measure a few metrics at once. Some platforms may support counter multiplexing, which gives the user the illusion of a larger number of registers by time sharing the performance registers. On the R10K series, the IRIX kernel supports multiplexing, allowing up to 32 events to be counted at once. Don't take fine grained measurements when multiplexing, unless you know what you're doing.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="170"></a>
<strong>My program runs fine when measuring 1 or 2 events, but when I add more I get a -8, PAPI_ECNFLCT error code. The error text says, "Event exists. but cannot be counted due to hardware resource limitations". What does this mean?</strong>
<blockquote><p>You have either exceeded the number of available hardware counters or two or more of the events you want to count need the same resources. This can be particularly annoying on machines like the Pentium 4. Although the P4 has 18 nominal counter registers, many events require resources that are restricted to 2 or 3 of these counters. In practice it is often difficult to count more than 4 or 5 simultaneous events on this platform.
One way around limited counter resources is to use multiplexing.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="75"></a>
<strong>What's multiplexing?</strong>
<blockquote><p>Many systems have only a few hardware performance counter registers; thus you can only measure a few metrics at once. Some platforms may support counter multiplexing, which gives the user the illusion of a larger number of registers by time sharing the performance registers. On the MIPS R10K series, the IRIX kernel supports multiplexing, allowing up to 32 events to be counted at once. On other platforms PAPI does the multiplexing itself, swapping events in and out of the counters based on a timer interrupt. Don't take fine grained measurements when multiplexing, unless you know what you're doing.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="86"></a>
<strong>Why am I still getting PAPI_ECNFLCT when using multiplexing?</strong>
<blockquote><p>PAPI multiplexing currently always uses one hardware counter for Total Cycles. If you are trying to multiplex a derived event on hardware with only two physical counters then you will get a PAPI_ECNFLCT error. This happens on the Intel Pentium IIIs for example.</p><p>Also, enabling multiplexing is a two-step process. You must call PAPI_multiplex_init() to initialize multiplexing system-wide. You must also call PAPI_set_multiplex() for *each* event set that you want to count in multiplexed mode. If you try to add too many events to an event set where multiplexing has not been set, a PAPI_ECNFLCT error will result.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="171"></a>
<strong>What's a derived event?</strong>
<blockquote><p>Hardware counters count low level events that can be directly measured in the hardware. Often these low level events must be combined to form meaningful PAPI preset events. This linear combination of low level events is called a derived PAPI event. Derived events are usually formed by adding or subtracting 2 'native' events, but occasionally derived events can contain 4 or more terms.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="87"></a>
<strong>When I compile and run the example program (PAPI_flops.c) on X platform I get the following error message: Error in PAPI_flops: Event exists, but cannot be counted due to hardware resource limits, what is the problem?</strong>
<blockquote><p>Hardware counters are a limited resource. Some PAPI preset events are derived, and require the use of more than one hardware counter. For example, Solaris has 2 counters, both of which are needed to count Floating point instructions. Flops also uses total cycles to measure time. On Solaris this would mean using 3 counters, and those resources aren't available.<br>
If you get this error on any platform, run the avail program in the ctests directory and see how many native events have to be monitored. PAPI_num_counters() can be used to determine how many counters exist on your platform. If there are more native events than counters, then this is the reason you are getting the error.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="73"></a>
<strong>Why can't I get my Fortran programs to compile with PAPI on a Cray T3E?</strong>
<blockquote><p>The Fortran header file you include has to be preprocessed before the Fortran file can use it. To have the cpp process the file before sending the file to the compiler, add the -F flag. For example:</p><p>f90 -F test.F -o test</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="79"></a>
<strong>What's wrong with PAPI_LST_INS (hex code 0x43) on my Pentium?</strong>
<blockquote><p>According to the Intel documentation, the counts from this event are not intuitive relating to it's description. Older releases of PAPI had this preset available in the Intel ports, but no longer. It does appear to work on the AMD Athlon.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="216"></a>
<strong>PAPI_create_eventset always returns an error now.</strong>
<blockquote><p>The EventSet MUST be set to PAPI_NULL before it is passed into PAPI_create_eventset.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="233"></a>
<strong>What's this GCC error about "thread local storage not support for this target"?</strong>
<blockquote><p>TLS is thread local storage, a high performance mechanism in later GCC's/GLIBC/pthread to do constant time access to thread local storage. PAPI uses this if available.</p><p>However, many systems (especially IA64 running Debian or SuSE) provide very poor/buggy/non-existent support for this. If you're getting an error during compile (or seg faults on every program during the run), then please rebuild using ./configure.</p><p>Other systems don't bother to ship a gcc with this turned on, so you'll get the above error. </p><p>./configure has a test to make sure that the thread support is working on your platform.</p><p>If you find a case where configure did not detect a broken
__thread implementation, please report it to us.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="65"></a>
<h2>The PAPI GIT Source Repository</h2>
<a name="325"></a>
<strong>Can I browse the source repository on the web?</strong>
<blockquote><p>Yes. The latest copy of the PAPI source tree is viewable through a web based source browser here:<br /><a href="../../trac/papi/browser">http://icl.cs.utk.edu/trac/papi/browser</a></p>
<p>This source browser is also accessible through the Trac bug reporting system or directly through the <a href="../../trac/papi/browser">Browse Source</a> link in the menu to the left.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="326"></a>
<strong>How do I download a copy of the current PAPI source tree?</strong>
<blockquote><p>Make sure git is installed on your machine. You can download a copy <a href="http://git-scm.com/">here</a>.</p>
<p>Download the PAPI repository the first time with the following command:</p>
<p>> git clone https://icl.cs.utk.edu/git/papi.git</p>
<p>This creates a complete copy of the papi git repository on your computer in a folder called 'papi'.</p>
<p>To make sure your copy is up to date with the repository:</p>
<p>> cd papi<br />> git pull https://icl.cs.utk.edu/git/papi.git</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="327"></a>
<strong>Can I commit changes to the PAPI repository?</strong>
<blockquote><p>You can always commit changes to your local copy of the PAPI respository using the "git commit" command and its variations. You cannot push those changes to the master copy of the repository without obtaining credentials from the PAPI team.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="328"></a>
<strong>Are there GUI interfaces available for GIT?</strong>
<blockquote><p>There are a large number of GUI interfaces to GIT. Some are free and some are commercial. Some are operating system specific and some are cross-platform. Since preferences in GUIs tend to be very personal, no recommendations will be made here. Google is your friend.</p>
<p>The author has had good success (so far) with SourceTree for Macintosh, available free at: <a href="http://www.sourcetreeapp.com/">http://www.sourcetreeapp.com/</a></p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="329"></a>
<strong>Where can I learn more about GIT?</strong>
<blockquote><p>The web has a variety of resources targetted at teaching you how to use GIT. A good place to start is the official <a href="http://git-scm.com/">GIT</a> site.<br />History and background of GIT can be found on <a href="http://en.wikipedia.org/wiki/Git_%28software%29">Wikipedia</a>.<br /><a href="http://rogerdudler.github.com/git-guide/">This</a> user-friendly introduction might help "git" you started.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="40"></a>
<h2>PAPI on AIX POWER Processors</h2>
<font color="#666"><p></p></font>
<a name="177"></a>
<strong>General Comments</strong>
<blockquote><p>If you are running papi-3.0 on aix5.2 & power4 combo, and seeing
failure. It is<br>
most likely caused by the BUG in the KERNEL. You need look for efix for
APAR IY57280, or<br>
contact papi team at [email protected] for the fix. Here is the more
precise info<br>
from IBM:<br>
the problam was introduced in 5.2 ML3, and fixed in 5.2 ML4 and 5.3.<br>
<br>
To use PAPI in 64-bit mode on power4:<br>
make -f Makefile.aix-power4-64bit<br>
link your program with
libpapi64.a or libpapi64.so<br>
<br>
See: /usr/lpp/pmtoolkit/lib/<arch>.evs for POWER3;<br>
/usr/pmapi/lib/POWER4.evs and /POWER4.gps for
POWER4<br>
<br>
For threaded programs, you had better:<br>
<br>
setenv AIXTHREAD_SCOPE S<br></p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="178"></a>
<strong>Installation notes</strong>
<blockquote><p>AIX 4.3.x:<br>
The current source and Makefile is for pmtoolkit 1.3.<br>
If you have pmtoolkit 1.2 the test cases will fail. For example:<br>
<br>
./tests/avail<br>
IOT trap<br>
<br>
This can be remdied by recompiling the PAPI library with the option<br>
-DPMTOOLKIT_1_2 set.<br>
<br>
AIX 5.x:<br>
The current source is for pmapi 1.4<br>
<br>
The aix-power substrate is contained in a single source file, but
targets<br>
three different configurations.<br>
Conditional compilation directed by three different make files
determines<br>
which configuration is targetted. Make sure you select the Makefile that<br>
matches your configuration:<br>
- Makefile.aix-power for AIX 4.3.x on POWER3<br>
- Makefile.aix5-power3 for AIX 5.x on POWER3<br>
- Makefile.aix-power4 for AIX 5.x on POWER4<br></p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="179"></a>
<strong>Test case notes</strong>
<blockquote><p>The POWER3 and POWER4 have a FMADD instruction. Although this
instruction<br>
performs two Floating Point operations, it is counted as one Floating
Point<br>
instruction. Because of this, there are situations where PAPI_FP_INS may<br>
produce fewer Floating Point counts than expected.<br>
Further, the Floating Point Instruction event on POWER3 and POWER4 also<br>
counts Floating Point Stores, leading to higher Floating Point counts
than<br>
expected. There are occasions where these two effects can cancel each
other<br>
out, to produce the right result for the wrong reason!<br>
Note that POWER3 and POWER4 also support an FMA counter (PAPI_FMA_INS).<br>
Thus, a more accurate count of Floating Point Operations can be obtained<br>
by PAPI_FP_INS + PAPI_FMA_INS.<br>
Correcting for the overcount by Floating Point Stores is more difficult,<br>
requiring the use of the native events: PM_FPU_LD_ST_ISSUES and
PM_FPU_LD.<br>
The complete expression for Floating Point Operations then becomes:<br>
PAPI_FP_INS + PAPI_FMA_INS - (PM_FPU_LD_ST_ISSUES - PM_FPU_LD)<br></p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="180"></a>
<strong>Counter notes</strong>
<blockquote><p>The POWER architecture supports up to 8 counters. However, in many cases<br>
events are mutually exclusive and can't be counted simultaneously.<br>
<br>
On POWER4, events are available only as members of predefined groups.<br>
For more on these groups, see /usr/pmapi/lib/POWER4.gps.<br>
<br>
The following table, submitted by Joel Malard, indicates<br>
events that cannot be counted simultaneously on POWER3:</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="235"></a>
<strong>Things go haywire on my Power/AIX box with threaded programs?</strong>
<blockquote><p>It is very important that you set the environment variable
AIXTHREAD_SCOPE to "S", which disables user level threads.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="45"></a>
<h2>Linux-IA64</h2>
<font color="#666"><p></p></font>
<a name="189"></a>
<strong>Floating Point</strong>
<blockquote><p>This version of the substrate always scales PME_FP_OPS_RETIRED_HI, hex code 0xa, even if you are using it as a NATIVE event. Previous versions of PAPI did not scale this event and could produce erroneously low counts for
PAPI_FP_OPS or PAPI_FP_INS.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="192"></a>
<strong>Notes on PAPI->Native event mappings</strong>
<blockquote><p>PAPI_CA_SNP<br>
PAPI_CA_INV<br>
Only counts snoops and invalidations from the local processor.<br>
PAPI_TLB_TL<br>
Counts "real" TLB misses, i.e. misses that cause a VHPT walk or a TLB<br>
miss trap to the OS. Misses in the L1 TLBs are not counted.<br>
PAPI_FP_STAL<br>
Counts stalls due to register dependencies and load latencies.<br>
If the FP pipeline can stall for some other reason (I don't know)<br>
then those stall cycles won't be counted.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="220"></a>
<strong>Why am I getting errors from perfmon and PAPI on my Redhat kernels?</strong>
<blockquote><p>Redhat broke the perfmon kernel interface in their kernels and thus only enabled it for root. In some kernels, its disabled entirely. You can test this by running your papi as root, if it then works, guess what, you have a broken kernel.</p><p>The fix is supposed to be in the latest update to RHEL3 and RHEL4. The best thing to do would be to download a kernel.org kernel, rebuild and go.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="234"></a>
<strong>Counter interrupts seem to have stopped on my threaded programs?</strong>
<blockquote><p>You are probably on an Altix or a system with a Redhat kernel. The solution for the later is replace the kernel you have with a patched kernel.org kernel, discussed in this section.</p><p>Please send us the kernel version if this happens to you. You'll notice it by running the profile_pthreads test case.</p><p>If you're an Altix user, then it's best to complain to SGI. But please let us know also.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="264"></a>
<strong>WHy can't I build PAPI with the Intel icc compiler?</strong>
<blockquote><p>The problem is not in PAPI, but in libpfm 3.x. When this library is built using icc, the file pfmlib_gen_ia64.c generates a series of errors. One workaround for this may be to make the libpfm library separately using gcc and then build PAPI with icc. Or just use gcc.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="46"></a>
<h2>Linux-Perfctr</h2>
<font color="#666"><p></p></font>
<a name="209"></a>
<strong>PAPI and the Linux Kernel</strong>
<blockquote><p>For Linux kernels more recent than 2.6.32, the perf_events interface is built into the kernel and can be used directly.</p>
<p>For Linux kernels before 2.6.32, PAPI requires your Linux kernel to be patched with either the PerfCtr patch or the Perfmon patch. For compatability reasons, we have included both of these patches in the tarball. You should patch your kernel with PerfCtr using the distribution found in the papi/src/perfctr-2.6.x directory for x86 hardware, and the papi/src/perfctr-2.7.x directory for IBM POWER hardware.
If you prefer Perfmon for kernels older than 2.6.30, you should use the distribution found in the papi/src/libpfm-3.y. Prefmon is no longer supported as a Linux patch.
The most recent Perfctr distribution can be obtained from Mikael Petterson's web site although it is no longer actively supported and not guaranteed to work. <a href="http://user.it.uu.se/~mikpe/linux/perfctr/">http://user.it.uu.se/~mikpe/linux/perfctr/</a><br /> If you're not sure how to patch, recompile and reinstall your linux kernel, there are a variety of resources on the web.
Here's one that should help: <a href="http://answers.oreilly.com/topic/36-how-to-patch-a-linux-kernel/">http://answers.oreilly.com/topic/36-how-to-patch-a-linux-kernel/.</a></p>
<p> </p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="210"></a>
<strong>Before you compile</strong>
<blockquote><p>cd perfctr<br>
more INSTALL<br>
If you're getting compilation errors regarding not being able to find include files, then you're probably running a broken redhat installation.</p><p>Edit the path to your kernel include files at the top of either Makefile.linux-perfctr</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="211"></a>
<strong>If you have already patched your kernel</strong>
<blockquote><p>If you have a properly functioning Perfctr patch from a previous release of PAPI, you will obviously not want to repatch your kernel. PAPI is compatible with PerfCtr 2.4.x and Perfctr 2.6.x.<br>
<br>
The x86 Makefiles:<br>
Makefile.linux-perfctr-p3<br>
Makefile.linux-perfctr-p4<br>
Makefile.linux-athlon<br>
Makefile.linux-opteron<br>
<br>
To recompile PAPI *not* using the included PerfCtr distribution, you simply pass the PERFCTR argument to the appropriate Makefile.<br>
<br>
make -f Makefile.linux-perfctr-p3<br> PERFCTR=/usr/src/perfctr-2.4.x<br>
<br>
To use Perfctr 2.6.x, simply type:<br>
make -f Makefile.linux-perfctr-p3<br>
<br>
To use the older version:<br>
make -f Makefile.linux-perfctr-p3 VERSION=2.4.x<br>
<br>
Easy huh?</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="121"></a>
<strong>How do I patch my Linux/Pentium I, II, III, IV, AMD K7, K8 box to work with PAPI?</strong>
<blockquote><p>See the INSTALL file in papi/src/perfctr-2.6.x. The instructions are very, very simple. Do not use perfctr-2.4.x unless you have to. There is no link of perfctr version to linux kernel version!</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="265"></a>
<strong>After reboot, the /dev/perfctr file always seems to have the wrong permissions and PAPI fails to initialize. What's going on?</strong>
<blockquote><p>You are probably running udev, which is not smart enough to know the permissions of dynamically created devices. To fix this, find your udev/devices directory, often /lib/udev/devices or /etc/udev/devices and perform the following actions.</p><p> mknod perfctr c 10 182</br>
chmod 644 perfctr</p><p>On Ubuntu 6.06 (and probably other debian distros), add a line to /etc/udev/rules.d/40-permissions.rules like this:</p><p> KERNEL=="perfctr", MODE="0666"</p><p>On SuSE, you may need to add something like the following to
/etc/udev/rules.d/50-udev-default.rules:
(SuSE does not have the 40-permissions.rules file in it.]</p><p># cpu devices</br>
KERNEL=="cpu[0-9]*", NAME="cpu/%n/cpuid"</br>
KERNEL=="msr[0-9]*", NAME="cpu/%n/msr"</br>
KERNEL=="microcode", NAME="cpu/microcode", MODE="0600"</br>
KERNEL=="perfctr", NAME="perfctr", MODE="0644"</br></p><p>These lines tell udev to always create the device file with the appropriate permissions. Use 'perfex -i' from the perfctr distribution to test this fix.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="212"></a>
<strong>Hardware interrupt driven counters</strong>
<blockquote><p>YOU MUST COMPILE YOUR KERNEL WITH APIC SUPPORT IF YOU WANT
INTERRUPT SUPPORT!<br>
With Perfctr 2.3.3 or later it is possible to make the performance counters
generate an interrupt when the counter reaches a certain count. This requires
support in the Linux kernel, Perfctr, PAPI and the CPU to work properly.<br>
The necessary kernel support is available if your kernel is compiled with
SMP APIC support or uni-processor APIC support compiled in. This is true
for 2.4-ac kernels and kernels 2.4.10 or later. This topic is discussed in
more detail in Mikael Pettersson's installation instructions for PerfCtr.<br>
Your CPU must be a Pentium 686/AMD K7 or similar which can generate APIC
interrupts for performance counter events. This is _not_ true for some mobile
Pentiums and early revisions of the AMD K7 or Athlon.<br>
You can verify that all is working by running the perfctr/examples/perfex
program with the -i flag. If you do not see "pcint" as one of the flags,
you need to recompile your kernel or buy a real CPU. ;-)</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="259"></a>
<strong>Why do PAPI_LD_INS and PAPI_SR_INS give identical results on Pentium 4?</strong>
<blockquote><p>Counting memory load and store instructions on the Pentium 4 is a two step process. First the desired events are tagged at the front of the pipeline. Then tagged events are counted as they graduate from the end of the pipeline. Unfortunately, the tags are all the same 'color' and can't be differentiated as they exit the pipe. Thus, you can correctly measure LD instructions, or correctly measure SR instructions, but if you try to measure them both at once, you will always get the sum of both operations in both counters. The same applies to PAPI_LST_INS.</p><p>This behavior is demonstrated in the test program ctests/p4_lst_ins.c.</p><p>The moral of the story is to always use these three events one-at-a-time on Pentium 4 machines.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="213"></a>
<strong>Floating point counts on the Pentium 4 series</strong>
<blockquote><p>The Pentium 4 can generate floating point
instructions either through the x87 floating point unit or with SSE
instructions.<br>
Furthermore SSE can generate either packed (multiple operands in one
128-bit
register) or unpacked (signal operand in one 128-bit register)
instructions.<br>
Depending on your compiler and settings you will get different
instruction mixes.<br>
<br>
PAPI provides 2 preset events to count floating point operations:<br>
- PAPI_FP_INS counts intstructions passing through the floating point
unit;<br>
- PAPI_FP_OPS counts something closer to theoretical floating point
operations.<br>
<br>
To minimize the overlap and maximize the usefulness of these two events
on
Pentium 4, we have made the following choices:<br>
- PAPI_FP_INS always counts only x87 floating point operations.<br>
- PAPI_FP_OPS counts can be customized as discussed below.<br>
<br>
Further complicating things is that the Pentium 4 hardware is too
restrictive
to count all these modes at once, so a decision must be made about what
to count.<br>
In order to enable PAPI to count these various mixes, we support 2
methods.<br>
<br>
1) The PAPI_PENTIUM4_FP_xxx defines.<br>
<br>
Set these in the EVENTFLAGS of either the
Makefile.linux-perfctr-p4 or<br>
Makefile.linux-perfctr-em64t.<br>
<br>
-DPAPI_PENTIUM4_FP_X87<br>
-DPAPI_PENTIUM4_FP_X87_SSE_SP<br>
-DPAPI_PENTIUM4_FP_X87_SSE_DP<br>
-DPAPI_PENTIUM4_FP_SSE_SP_DP<br>
<br>
The predefined value for Nocona/EM64T/Pentium 4 Model 3 is:<br>
<br>
-DPAPI_PENTIUM4_FP_X87_SSE_DP.<br>
<br>
The predefined value for anything else is:<br>
<br>
-DPAPI_PENTIUM4_FP_X87.<br>
<br>
If nothing is defined, the substrate defaults to:<br>
<br>
-DPAPI_PENTIUM4_FP_X87_SSE_DP.<br>
<br>
2) The PAPI_PENTIUM4_FP environment variable.<br>
<br>
Set this to one or two of the following, and it will
change the<br>
behavior of PAPI_FP_OPS.<br>
<br>
X87: count all x87 instructions<br>
SSE_SP: count all unpacked SSE single precision
instructions<br>
SSE_DP: count all unpacked SSE double precision
instructions<br>
<br>
Due to the design of the register set, only 2 of the three
are countable<br>
at one time. Sorry folks.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="214"></a>
<strong>Vector instruction counts on the Pentium 4 series</strong>
<blockquote><p>PAPI can count 2 different types of vector instructions on the Pentium
4.<br>
Either MMX instructions or packed SSE floating point instructions. These
are supported with 2 methods, in a similar fashion to floating point
events described above.<br>
<br>
1) The PAPI_PENTIUM4_VEC_xxx defines.<br>
<br>
Set these in the EVENTFLAGS of either the
Makefile.linux-perfctr-p4 or<br>
Makefile.linux-perfctr-em64t.<br>
<br>
-DPAPI_PENTIUM4_VEC_MMX<br>
-DPAPI_PENTIUM4_VEC_SSE<br>
<br>
The current default for all platforms is:<br>
<br>
-DPAPI_PENTIUM4_VEC_SSE.<br>
<br>
If nothing is defined, the substrate defaults to:<br>
<br>
-DPAPI_PENTIUM4_VEC_SSE.<br>
<br>
2) The PAPI_PENTIUM4_VEC environment variable.<br>
<br>
Set this to either of the following, and it will change the<br>
behavior of PAPI_VEC_INS.<br>
<br>
SSE: count all packed SSE SP and DP instructions<br>
MMX: count all 64 and 128 bit MMX instructions</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="215"></a>
<strong>The memory test sometimes fails on Athlon Processors.</strong>
<blockquote><p>This is a known issue and we are looking in to the cause. Currently, we have no fix or work around.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="262"></a>
<strong>Floating Point counts on AMD Opteron</strong>
<blockquote><p>(The following discussion does not apply to newer quad-core and higher Opteron processors)</p>
<p>The AMD Opteron is the first chip series from AMD that can measure and report floating point operations. Two native events measure floating point activity. One measures speculative operations that enter the FP units; the other measures operations that retire from the FP units.</p>
<p>The retired event generates precise event counts that scale with the amount of work done. However, it measures data movement as well as floating point operations, resulting in counts that are consistently significantly higher than the expected theoretical counts, often by factors of 2 or more.</p>
<p>The speculative event can be configured to generate counts of only the operations typically of interest. Since these counts are speculative, they tend to be higher by often widely variable amounts than expected theoretical counts, especially on complex production codes.</p>
<p>PAPI provides 2 preset events to count floating point operations:<br /><br /> - PAPI_FP_INS counts intstructions passing through the floating point unit;<br /> - PAPI_FP_OPS is intended to count something closer to theoretical floating point operations.<br /><br /> To minimize the overlap and maximize the usefulness of these two events on AMD Opteron, we have made the following choices:<br /></p>
<p>- PAPI_FP_INS always counts retired floating point operations. This value will be precise and accurate, but will include FP loads and stores as well as computations.</p>
<p>- PAPI_FP_OPS counts speculative computation operations by default, but can be customized as discussed below.<br /></p>
<p>As an alternative to counting speculative computations, PAPI_FP_OPS can be configured to retired operations corrected for data movement. Unfortunately, the correction factors themselves are speculative, and can lead to undercounting errors similar in magnitude to those seen in the pure speculative counts.</p>
<p>Two methods are provided to allow customization of PAPI_FP_OPS:<br /><br /> 1) The PAPI_OPTERON_FP_xxx defines.<br /><br /> Set these in the CFLAGS variable of Makefile.linux-perfctr-opteron.<br /><br /> -DPAPI_OPTERON_FP_RETIRED<br /> -DPAPI_OPTERON_FP_SSE_SP<br /> -DPAPI_OPTERON_FP_SSE_DP<br /> -DPAPI_OPTERON_FP_SPECULATIVE<br /><br /> The default value is equivalent to:<br /><br /> -DPAPI_OPTERON_FP_SPECULATIVE.<br /><br /> 2) The PAPI_OPTERON_FP environment variable.<br /><br /> Set this to one of the following, and it will change the
behavior of PAPI_FP_OPS.<br /></p>
<p>RETIRED: count all retired FP instructions<br /> SSE_SP: correct retired counts optimized for single precision<br /> SSE_DP: correct retired counts optimized for double precision<br /> SPECULATIVE: count speculative computations (default)</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<br />
<a name="47"></a>
<h2>Solaris-Ultra</h2>
<font color="#666"><p></p></font>
<a name="193"></a>
<strong>General Comments</strong>
<blockquote><p>Assembler stubs for get_tick() and cpu_sync() as well as the following defines have been blatantly stolen from the perfmon code. The author of the package "perfmon" is Richard J. Enbody and the home page for "perfmon" is <a href="http://www.cse.msu.edu/~enbody/perfmon.html">http://www.cse.msu.edu/~enbody/perfmon.html</a>. For *all* the native event names, run native_avail in the ctests subdirectory. For how to use the native event names, see native.c</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="194"></a>
<strong>Bugs</strong>
<blockquote><p>1) Ultra I/II/III/III+ are currently supported;</p><p>2) Some of the cache events have documented bugs, see the Sun UltraSparc hardware reference manual.</p><p>3) WARNING FOR PEOPLE USING MULTITHREADED LIBRARIES ON SOLARIS 2.8:
There is a bug that prevents setitimer() from being called after the process has called pthread() create at any point in time. Therefore if you suspect your communication library is multithreaded, you had better start the instrumentation before initializing it. See multiplex3_pthreads for details.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
<a name="76"></a>
<strong>My Sun box doesn't have libcpc.h. What should I do?</strong>
<blockquote><p>You didn't check the <a href="http://icl.cs.utk.edu/papi/custom/index.html?lid=62&slid=96">PAPI Supported Platform Matrix</a>. The hardware counters on SunOS withUltraSparc are only available on Sun OS 5.8 and above. That's Solaris 2.8 for you SVR4 people.</p></blockquote>
<center><font size="-1"><a href="#top">back to top</a></font></center>
<br /> <br />
</td>
</tr>
</table>
</center>
</body>
</html>