-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES
1129 lines (969 loc) · 50.5 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
This file describes the most significant changes. For more detail, use
'git log' on a clone of the charm repository.
================================================================================
What's new in Charm++ 6.7.0
================================================================================
Over 120 bugs fixed, spanning areas across the entire system
Charm++ Features
- New API for efficient formula-based distributed spare array creation
- CkLoop is now built by default
- CBase_Foo::pup need not be called from Foo::pup in user code anymore - runtime
code handles this automatically
- Error reporting and recovery in .ci files is greatly improved, providing more
precise line numbers and often column information
- Many data races occurring under shared-memory builds (smp, multicore) were
fixed, facilitating use of tools like ThreadSanitizer and Helgrind
AMPI Enhancements
- Further MPI standard compliance in AMPI allows users to build and run
Hypre-2.10.1 on AMPI with virtualization, migration, etc.
- Improved AMPI Fortran2003 PUP interface 'apup', similiar to C++'s STL PUP
Platforms and Portability
- Compiling Charm++ now requires support for C++11 variadic templates. In GCC,
this became available with version 4.3, released in 2008
- New machine target for multicore Linux ARM7: multicore-linux-arm7
- Preliminary support for POWER8 processors, in preparation for the upcoming
Summit and Sierra supercomputers
- The charmrun process launcher is now much more robust in the face of slow
or rate-limited connections to compute nodes
- PXSHM now auto-detects the node size, so the '+nodesize' is no longer needed
- Out-of-tree builds are now supported
Deprecations
- CommLib has been removed.
- CmiBool has been dropped in favor of C++'s bool
================================================================================
What's new in Charm++ 6.6.0
================================================================================
- Machine target files for Cray XC systems ('gni-crayxc') have been added
- Interoperability with MPI code using native communication interfaces on Blue
Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal
MPI communication interface
- Support for partitioned jobs on all machine types, including TCP/IP and IB
Verbs networks using 'netlrts' and 'verbs' machine layers
- A substantially improved version of our asynchronous library, CkIO, for
parallel output of large files
- Narrowing the circumstances in which the runtime system will send
overhead-inducing ReductionStarting messages
- A new fully distributed load balancing strategy, DistributedLB, that produces
high quality results with very low latency
- An API for applications to feed custom per-object data to specialized load
balancing strategies (e.g. physical simulation coordinates)
- SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs)
support tracing messages through communication threads
- Thread affinity mapping with +pemap now supports Intel's Hyperthreading more
conveniently
- After restarting from a checkpoint, thread affinity will use new
+pemap/+commap arguments
- Queue order randomization options were added to assist in debugging race
conditions in application and runtime code
- The full runtime code and associated libraries can now compile under the C11
and C++11/14 standards.
- Numerous bug fixes, performance enhancements, and smaller improvements in the
provided runtime facilities
- Deprecations
* The long-unsupported FEM library has been deprecated in favor of ParFUM
* The CmiBool typedefs have been deleted, as C++ bool has long been universal
* Future versions of the runtime system and libraries will require some degree
of support for C++11 features from compilers
================================================================================
What's new in Charm++ 6.5.0
================================================================================
- The Charm++ manual has been thoroughly revised to improve its organization,
comprehensiveness, and clarity, with many additional example code snippets
throughout.
- The runtime system now includes the 'Metabalancer', which can provide
substantial performance improvements for applications that exhibit dynamic
load imbalance. It provides two primary benefits. First, it automatically
optimizes the frequency of load balancer invocation, to avoid work stoppage
when it will provide too little benefit. Second, calls to AtSync() are made
less synchronous, to further reduce overhead when the load balancer doesn't
need to run. To activate the Metabalancer, pass the option +MetaLB at
runtime. To get the full benefits, calls to AtSync() should be made at every
iteration, rather than at some arbitrary longer interval as was previously
common.
- Many feature additions and usability improvements have been made in the
interface translator that generates code from .ci files:
* Charmxi now provides much better error reports, including more accurate
line numbers and clearer reasons for failure, including some semantic
problems that would otherwise appear when compiling the C++ code or even at
runtime.
* A new SDAG construct 'case' has been added that defines a disjunction over a
set of 'when' clauses: only one 'when' out of a set will ever be triggered.
* Entry method templates are now supported. An example program can be found
in tests/charm++/method_templates/.
* SDAG keyword "atomic" has been deprecated in favor of the newly supported
keyword "serial". The two are synonymous, but "atomic" is now provided only
for backward compatibility.
* It is no longer necessary to call __sdag_init() in chares that contain SDAG
code - the generated code does this automatically. The function is left as
a no-op for compatibility, but may be removed in a future version.
* Code generated from .ci files is now primarily in .def.h files, with only
declarations in .decl.h. This improves debugging, speeds compilation,
provides clearer compiler output, and enables more complete encapsulation,
especially in SDAG code.
* Mainchare constructors are expected to take CkArgMsg*, and always have
been. However, charmxi would allow declarations with no argument, and
assume the message. This is now deprecated, and generates a warning.
- Projections tracing has been extended and improved in various ways
* The trace module can generate a record of network topology of the nodes in
a run for certain platforms (including Cray), which Projections can
visualize.
* If the gzip library (libz) is available when Charm++ is compiled, traces
are compressed by default.
* If traces were flushed as a results of filled buffers during the run, a
warning will be printed at exit to indicate that the user should be wary of
interference that may have resulted.
* In SMP builds, it is now possible to trace message progression through the
communication threads. This is disabled by default to avoid overhead and
potential misleading interpretation.
- Array elements can be block-mapped at the SMP node level instead of at the
per-PE level (option "+useNodeBlkMapping").
- AMPI can now privatize global and static variables using TLS. This is
supported in C and C++ with __thread markings on the variable declarations
and definitions, and in Fortran with a patched version of the gfortran
compiler. To activate this feature, append '-tls' to the '-thread' option's
argument when you link your AMPI program.
- Charm can now be built to only support message priorities of a specific data
type. This enables an optimized message queue within the the runtime
system. Typical applications with medium sized compute grains may not benefit
noticeably when switching to the new scheduler. However, this may permit
further optimizations in later releases.
The new queue is enabled by specifying the data type of the message
priorities while building charm using --with-prio-type=dtype. Here, dtype can
be one of char, short, int, long, float, double and bitvec. Specifying bitvec
will permit arbitrary-length bitvector priorities, and is the current default
mode of operation. However, we may change this in a future release.
- Converse now provides a complete set of wrappers for
fopen/fread/fwrite/fclose to handle EINTR, which is not uncommon on the
increasingly-popular Lustre. They are named CmiF{open,read,write,close}, and
are available from C and C++ code.
- The utility class 'CkEntryOptions' now permits method chaining for cleaner
usage. This applies to all its set methods (setPriority, setQueueing,
setGroupDepID). Example usage can be found in examples/charm++/prio/pgm.C.
- When creating groups or chare arrays that depend on the previous construction
of another such entity on the local PE, it is now possible to declare that
dependence to the runtime. Creation messages whose dependence is not yet
satisfied will be buffered until it is.
- For any given chare class Foo and entry method Bar, the supporting class's
member CkIndex_Foo::Bar() is used to lookup/specify the entry method
index. This release adds a newer API for such members where the argument is a
function pointer of the same signature as the entry method. Those new
functions are used like CkIndex_Foo::idx_Bar(&Foo::Bar). This permits entry
point index lookup without instantiating temporary variables just to feed the
CkIndex_Foo::Bar() methods. In cases where Foo::Bar is overloaded, &Foo::Bar
must be cast to the desired type to disambiguate it.
- CkReduction::reducerType now have PUP methods defined; and can hence be
passed as parameter-marshalled arguments to entry methods.
- The runtime option +stacksize for controlling the allocation of user-level
threads' stacks now accepts shorthanded annotation such as 1M.
- The -optimize flag to the charmc compiler wrapper now passes more aggressive
options to the various underlying compilers than the previous '-O'.
- The charmc compiler wrapper now provides a flag -use-new-std to enable
support for C11 and C++11 where available. To use this in application code,
the runtime system must have been built with that flag as well.
- When using, CmiMemoryUsage(), the runtime can be instructed not to use the
underlying mallinfo() library call, which can be inaccurate in settings where
usage exceeds INT_MAX. This is accomplished by setting the environment
variable "MEMORYUSAGE_NO_MALLINFO".
- Experimental Features
* Initial implementation of a fast message-logging protocol. Use option
'mlogft' to build it.
* Message compression support for persistent message on Gemini machine layer.
* Node-level inter-PE loop/task parallelization is now supported through
CkLoop
* New temperature/CPU frequency aware load balancer
* Support interoperation of Charm++ and native MPI code through dynamically
switching control between the two
* API in centralized load balancers to get and set PE speed
* A new scheme for optimization of double in-memory checkpoint/restart.
* Message combining library for improved fine-grained communication
performance
* Support for partitioning of allocated nodes into subsets that run
independent Charm++ instances but can interact with each other.
Platform-Specific Changes
-------------------------
- Cray XE/XK
* The gemini_gni network layer has been heavily tuned and optimized,
providing substantial improvements in performance, scalability, and
stability.
* The gemini_gni-crayxe machine layer supports a 'hugepages' option at build
time, rather than requiring manual configuration file editing.
* Persistent message optimizations can be used to reduce latency and
overheads
* Experimental support for 'urgent' sends, which are sent ahead of any other
outgoing messages queued for transmission.
- IBM Blue Gene Q: Experimental machine-layer support for the native PAMI
interface and MPI, with and without SMP support. This supports many new
systems, including LLNL's Sequoia, ALCF's Mira, and FZ Juelich's Juqueen.
There are three network-layer implementations for these systems: 'mpi',
'pami', and 'pamilrts'. The 'mpi' layer is stable, but its performance and
scalability suffers from the additional overhead of using MPI rather than
driving the interconnect directly. The 'pami' layer is well tested for NAMD,
but has shown instability for other applications. It is likely to be replaced
by the 'pamilrts' layer, which is more generally stable and seems to provide
the same performance, in the next release.
In addition to the common 'smp' option to build the runtime system with
shared memory support, there is an 'async' option which sometimes provides
better performance on SMP builds. This option passes tests on 'pamilrts', but
is still experimental.
Note: Applications that have large number of messages may crash in default
setup due to overflow in the low-level FIFOs. Environment variables
MUSPI_INJFIFOSIZE and PAMI_RGETINJFIFOSIZE can be set to avoid application
failures due to large number of small and large messages respectively. The
default value of these variable is 65536 which is sufficient for 1000
messages in flight.
- Infiniband Verbs: Better support for more flavors of ibverbs libraries
- MPI Network Layer
* Experimental rendezvous protocol for better performance above some MPI
implementations.
* Some tuning parameters ("+dynCapSend" and "+dynCapRecv") are now
configurable at job launch, rather than Charm++ compilation.
- PGI C++: Disable automatic 'using namespace std;'
- Charm++ now supports ARM, both non-smp and smp.
- Mac OS X: Compilation options to build and link correctly on newer versions
================================================================================
What's new in Charm++ 6.4.0
================================================================================
--------------------------------------------------------------------------------
Platform Support
--------------------------------------------------------------------------------
- Cray XE and XK systems using the Gemini network via either MPI
(mpi-crayxe) or the native uGNI (gemini_gni-crayxe)
- IBM Blue Gene Q, using MPI (mpi-bluegeneq) or PAMI (pami-bluegeneq)
- Clang, Cray, and Fujitsu compilers
- MPI-based machine layers can now run on >64k PEs
--------------------------------------------------------------------------------
General Changes
--------------------------------------------------------------------------------
- Added a new [reductiontarget] attribute to enable
parameter-marshaled recipients of reduction messages
- Enabled pipelining of large messages in CkMulticast by default
- New load balancers added:
* TreeMatch
* Zoltan
* Scotch graph partitioning based: ScotchLB and Refine and Topo variants
* RefineSwap
- Load balancing improvements:
* Allow reduced load database size using floats instead of doubles
* Improved hierarchical balancer
* Periodic balancing adapts its interval dynamically
* User code can request a callback when migration is complete
* More balancers properly consider object migratability and PE
availability and speed
* Instrumentation records multicasts
- Chare arrays support options that can enable some optimizations
- New 'completion detection' library for parallel process termination
detection, when the need for modularity excludes full quiescence
detection
- New 'mesh streamer' library for fine-grain many-to-many collectives,
handling message bundling and network topology
- Memory pooling allocator performance and resource usage improved
substantially
- AMPI: More routines support MPI_IN_PLACE, and those that don't check
for it
================================================================================
What's new in Charm++ 6.2.1 (since 6.2.0)
================================================================================
--------------------------------------------------------------------------------
New Supported Platforms:
--------------------------------------------------------------------------------
POWER7 with LAPI on Linux
Infiniband on PowerPC
--------------------------------------------------------------------------------
General Changes
--------------------------------------------------------------------------------
- Better support for multicasts on groups
- Topology information gathering has been optimized
- Converse (seed) load balancers have many new optimizations applied
- CPU affinity can be set more easily using +pemap and +commap options
instead of the older +coremap
- HybridLB (hierarchical balancing for very large core-count systems)
has been substantially improved
- Load balancing infrastructure has further optimizations and bug fixes
- Object mappings can be read from a file, to allow offline
topology-aware placement
- Projections logs can be spread across multiple directories, speeding
up output when dealing with thousands of cores (+trace-subdirs N
will divide log files evenly among N subdirectories of the trace
root, named PROGNAME.projdir.K)
- AMPI now implements MPI_Issend
- AMPI's MPI_Alltoall uses a flooding algorithm more agressively,
versus pairwise exchange
- Virtualized ARMCI support has been extended to cover the functions
needed by CAF
--------------------------------------------------------------------------------
Architecture-specific changes
--------------------------------------------------------------------------------
- LAPI SMP has many new optimizations applied
- Net builds support the use of clusters' mpiexec systems for job
launch, via the ++mpiexec option to charmrun
================================================================================
What's new in Charm++ 6.2.0 (since 6.1)
================================================================================
--------------------------------------------------------------------------------
New Supported Platforms:
--------------------------------------------------------------------------------
64-bit MIPS, such as SiCortex, using mpi-linux-mips64
Windows HPC cluster, using mpi-win32/mpi-win64
Mac OSX 10.6, Snow Leopard (32-bit and 64-bit).
--------------------------------------------------------------------------------
General Changes
--------------------------------------------------------------------------------
Runtime support
- Smarter build/configure scripts
- A new interface for model-based load balancing
- new CPU topology API
- a general implementation of CmiMemoryUsage()
- Bug fix: Quiescence detection (QD) works with immediate messages
- New reduction functions implemented in Converse
- CCS (Converse Client-Server) can deliver message to more than one processor
- Added a memory-aware adaptive scheduler, which can be optionally
compiled in to charm
- Added preliminary support for automatic message prioritization
(disabled by default)
Charm++
- Cross-array and cross-group sections
- Structured Dagger (SDAG): Support templated arguments properly
- Plain chares support checkpoint/restart (both in-memory and disk-based)
- Conditional packing of messages and parameters in SMP scenario
- Changes to the CkArrayIndex class hierarchy
-- sizeof() all CkArrayIndex* classes is now the same
-- Codes using custom array indices have to use placement-new to construct
their custom index. Refer example code: examples/charm++/hello/fancyarray/
-- *** Backward Incompatibility ***
CkArrayIndex[4D/5D/6D]::index are now of type int (instead of short)
However the data is stored as shorts. Access by casting
CkArrayIndexND::data() appropriately
-- *** Deprecated ***
The direct use of public data member
CkArrayIndexND::index (N=1..6) is deprecated. We reserve the right to
change/remove this variable in future releases of Charm++.
Instead, please access the indices via member function:
int CkArrayIndexND::data()
Adaptive MPI (AMPI)
- Compilers renamed to avoid collision with host MPI (ampicc, ampiCC,
ampif77, ampif90)
- Improved MPI standard conformance, and documentation of non-conformance
* Bug fixes in: MPI_Ssend, MPI_Cart_shift, MPI_Get_count
* Support MPI_IN_PLACE in MPI_(All)Reduce
* Define various missing constants
- Return the received message's tag in response to a non-blocking
wildcard receive, to support SuperLU
- Improved tracing for BigSim
Multiphase Shared Arrays (MSA)
- Typed handles to enforce phases
- Split-phase synchronization to enable message-driven execution
- 3D arrays
TCharm
- Automatic tracing of API calls for simulation and analysis
Debugging
- Wider support for architectures other than net- (in particular MPI layers)
- Improved support for large scale debugging (better scalability)
- Enhanced record/replay stability to handle various events, and to
signal unexpected messages
- New detailed record/replay: The full content of messages can be
recorded, and a single processor can be re-executed outside of the
parallel application
Performance analysis
- Tracing of nested entry methods
Automatic Performance Tuning
- Created an automatic tuning framework [still for experimental use only]
CkMulticast
- Network-topology / node aware spanning trees used internally for and
lower bytes on the network and improved performance in multicasts and
reductions delegated to this library
Comlib
- Improved OneTimeMulticastStrategy classes
BigSim
- Out-of-core support, with prefetching capability
- Detailed tracing of MPI calls
- Detailed record/replay support at emulation time, capable of
replaying any emulated processor after obtained recorded logs.
--------------------------------------------------------------------------------
Architecture-specific changes
--------------------------------------------------------------------------------
Net-*
- Can run jobs with more than 1024 PEs
Net-Linux
- New charmrun option ++no-va-randomization to disable address space
randomization (ASLR). This is most useful for running AMPI with
isomalloc
MPI
- Default to using ampicxx instead of mpiCC
MPI-SMP
- The +p option now has the same semantics as in other smp builds
Power 7
- Support for VSX in SIMD abstraction API
Blue Gene/L
- Compilers and options have been updated to the latest ones
Blue Gene/P
- Added routines for measuring performance counters on BG/P.
- Updated to support latest DCMF driver version. On ANL's Intrepid, you may
need to set BGP_INSTALL=/bgsys/drivers/V1R4M1_460_2009-091110P/ppc in your
environment. This is the default on ANL's Surveyor.
Cray XT
- cputopology information is now available on XT3/4/5
Infiniband (ibverbs)
- Bug fix: plug memory leaks that caused failures in long runs
- Optimized to reduce startup delays
LAPI
- Support for SMP (experimental)
================================================================================
Note that changes from 5.9, 6.0, and 6.1 are not documented here. A partial list
can be found on the charm download page, or by reading through version control
logs.
================================================================================
What's New since Charm++ 5.4 release 1
================================================================================
--------------------------------------------------------------------------------
New Supported Platforms:
--------------------------------------------------------------------------------
1. Charm++ ported to IA64 Itanium running Win2K and Linux, Charm++ also support
Intel C/C++ compilers;
2. Charm++ ported to Power Macintosh powerpc running Darwin;
3. Charm++ ported to Myrinet networking with GM API;
--------------------------------------------------------------------------------
Summary of New Features:
--------------------------------------------------------------------------------
1. Structure Dagger
Structured Dagger is a coordination language built on top of CHARM++.
Structured Dagger allows easy expression of dependences among messages and
computations and also among computations within the same object using
when-blocks and various structured constructs.
2. Entry functions support parameter marshalling
Now you can declare and invoke remote entry functions using parameter
marshalling instead of defining messages.
3. Easier running - standalone mode
For net-* version running locally, you can now run Charm programs without
charmrun. Running a node program directly from command line is now the
same as "charmrun +p1 <program>"; for SMP version, you can also specify
multiple (local) processors, as in "program +p2".
--------------------------------------------------------------------------------
Summary of Changes:
--------------------------------------------------------------------------------
1. "build" changed for compilation of Charm++
To build Charm++ from scratch, we now take additional command line options
to compile with addon features and using different compilers other than gcc.
For example, to build Linux IA64 with Myrinet support, type command:
./build net-linux-ia64 gm
******* Old Change histories *******
================================================================================
What's New in Charm++ 5.4 release 1 since 5.0
================================================================================
--------------------------------------------------------------------------------
New Supported Platforms:
--------------------------------------------------------------------------------
1. Win9x/2000/NT: with Visual C++ or Cygwin gcc/g++, you can compile and run
Charm++ programs on all Win32 platforms.
2. Scyld Beowulf: Charm++ has been ported to the Linux-based Scyld Beowulf
operating system. For more information on Scyld, see <http://www.scyld.com>
3. MPI with VMI: Charm++ has been ported to NCSA's Virtual Machine Interface,
which is an efficient messaging library for heterogeneous cluster
communication.
--------------------------------------------------------------------------------
Summary of New Features:
--------------------------------------------------------------------------------
1. Dynamic Load balancing:
Chare migration is supported in the new release. Migration-based dynamic
load balancing framework with various load balancing strategies library has
been added.
2. Chare Array
Charm++ array is supported. You can now create an array of Chare objects
and use array index to refer the Charm++ array elements. A reduction
library on top of Chare array has been implemented and included.
3. Projections
Projections, a Java application for Charm++ program performance analysis and
visualization, has been included and distributed in the new release. Two
trace modes are available: trace-projections and trace-summary. Trace-summary
is a light-weight trace library compared to trace-projections.
4. AMPI
AMPI is a load-balancing based library for porting legacy MPI applications
to Charm++. With few changes in the original MPI code to AMPI, the new
legacy MPI application on Charm++ will gain from Charm++'s adptive
load balancing ability.
5. Easier invocation
"Charmrun" is now available on all platforms, with a uniform command line
syntax. You can forget the difference between net-* versions and MPI versions,
and run charm++ application with this same charmrun command syntax.
++local option is added in charmrun for net-* version, it provides
simple local use of Charm and no longer require the ability to
"rsh localhost" or a nodelist file in order to run charm only on the local
machine. This is especially attractive when you run Charm++ on Windows.
6. New libraries:
Many new libraries have been added in this release. They include:
1) master-slave library: for writing manager-worker paradigm programs.
2) receiver library: provide asynchronous communication mode for chare array.
3) f90charm: provides Fortran90 bindings for Charm++ Array.
4) BlueGene: a Charm++/Converse emulator for IBM proposed Blue Gene.
--------------------------------------------------------------------------------
Summary of Changes:
--------------------------------------------------------------------------------
1. message declaration syntax in .ci file:
The message declaration syntax for packed/varsize messages has been changed.
The packed/varsize keywords are eliminated, and you can specify the actual
actual varsize arrays in the interface file and have the translator generate
alloc, pack and unpack.
Here is the detailed list of Changes:
--------------------------------------------------------------------------------
Major Features:
--------------------------------------------------------------------------------
10/06/1999 rbrunner Added migration-based dynamic load balancing
framework.
11/15/1999 olawlor Added reduction support foe Charm++ arrays
02/06/2000 milind Added AMPI, an implementation of MPI with
dynamic load balancing
02/18/2000 paranjpy New platforms supported: net-win32, and net-win32-smp
04/04/2000 olawlor Added arbitrarily indexed Charm++ arrays.
Also, added translator support for new arrays.
04/15/2000 olawlor Added "puppers" for packing and unpacking
objects.
06/14/2000 milind Added the threaded FEM framework.
--------------------------------------------------------------------------------
Minor Features:
--------------------------------------------------------------------------------
10/09/1999 rbrunner Added packlib, a library for C and C++ to
pack-unpack data to/from Charm++ messages.
10/13/1999 gzheng New LB strategy: RefineLB
10/13/1999 paranjpy New LB Strategy: Heap
10/14/1999 milind New LB Strategy: Metis
10/19/1999 olawlor New test program for testing LB strategies.
10/21/1999 gzheng New trace mode: trace-summary
10/28/1999 milind New supported platform: net-sol-x86
10/29/1999 milind Added runtime checks for ChareID assignment.
11/10/1999 rbrunner Added Neighborhood base strategy for LB
framework.
11/15/1999 olawlor conv-host now reads in a startup file
~/.conv-hostrc
11/15/1999 olawlor New test program for testing array reductions.
11/16/1999 rbrunner Added processor-speed checking functions to
LB framework
11/19/1999 milind Mapped SIGUSR to a Ccd condtion handler
11/22/1999 rbrunner New LB strategy: WSLB
11/29/1999 ruiliu Modified Metis LB strategy to deal with
different processor speeds
12/16/1999 rbrunner New LB strategy: GreedyRef
12/16/1999 rbrunner New LB strategy: RandRef
12/21/1999 skumar2 New LB strategy: CommLB
01/03/2000 rbrunner New LB strategy: RecBisectBfLB
01/08/2000 skumar2 New LB strategy: Comm1LB, with varying processor
speeds
01/18/2000 milind Modified SM library syntax, and added a test
program for SM.
01/19/2000 gzheng Added irecv, a library to simplify conversion
of message-passing programs to Charm++
02/20/2000 olawlor Added preliminary broadcast support to Charm++
arrays.
02/23/2000 paranjpy Added converse-level quiescence detection
03/02/2000 milind Added ++server-port option to pre-specify
CCS port.
03/10/2000 wilmarth Random seed-based load balancer now uses
bit-vector for active PEs.
03/21/2000 gzheng Added support for marking user-defined events
in trace-summary.
03/28/2000 wilmarth Added CMK_TRUECRASH. Very helpful for
post-mortem debugging of Charm++ programs on
net-* versions.
03/31/2000 jdesouza Added Fortran90 support to the Charm++
interface translator.
03/09/2000 milind Added support for -LANG and -rpath options
in charmc for Origin2000.
04/28/2000 milind Added prioritized converse threads.
05/01/2000 milind Added test programs for TeMPO, AMPI and irecv.
05/04/2000 milind New supported platform: mpi-sp.
05/04/2000 gzheng Added irecv pingpong program.
05/17/2000 olawlor Each chare, group and array element now has to
have migration constructor.
05/24/2000 milind Added Jacobi3D programs for irecv and AMPI both.
05/24/2000 milind Made migratable an optional attribute of
chares, groups, and nodegroups.
Arrays are by default migratable.
05/29/2000 paranjpy Added pup methods to arrays, reductions etc
internal objects.
06/13/2000 milind Made CtvInitialize idempotent. That is, it
can be called by any number of threads now,
only the first one will actually do
CtvInitialize.
06/20/2000 milind Added a simple test program for the FEM
framework.
07/06/2000 milind Imported Metis 4.0 sources in the CVS tree.
Also added code to make metis libraries and
executables to Makefile.
07/07/2000 milind Added more meaningfull error messages using
perror in addition to a cryptic error codes in
net-* versions.
07/10/2000 milind fem and femf are now recognized as "languages"
by charmc.
07/10/2000 saboo Added the derived datatypes library.
07/13/2000 milind Added +idle_timeout functionality. It takes a
commandline parameter denoting milliseconds of
maximum consecutive idle time allowed per
processor.
07/14/2000 milind Added group multicast. Added
CkSendMsgBranchMulti, CldEnqueueMulti, and
translator changes to support it.
07/14/2000 milind SUPER_INSTALL now takes "-*" arguments prior
to the target, that will be passed to make as
"makeflags". This makes it easy to suppress
make's output of commands etc (with the -s
flag). As a result of this, several Makefiles
have been massaged.
07/18/2000 milind Added support for using "dbx" on suns as
debugger.
07/19/2000 milind Added ability to tracemode projections which
produces binary trace files. Use flag
+binary-trace on the command line.
07/26/2000 milind Separated AMPI from TeMPO.
07/28/2000 milind Added test programs to test reduce, alltoall
and allreduce functionality of AMPI.
08/02/2000 milind Added an option to let the user specify which
"xterm" to use. For example, on some systems
(CDE), only dtterm is installed. So, by
putting ++xterm dtterm on the conv-host
commandline, one can use dtterm when ++in-xterm
option is specified on conv-host commandline.
08/14/2000 milind FEM Framework: Added capabilities to handle
esoteric meshes to standalone offline programs.
Makefile now produces gmap and fgmap programs,
which are used for this purpose. They convert
the mesh to a graph before partitioning it
using Metis.
08/24/2000 milind Added the 2D crack propagation program as a
test program for FEM framework.
08/25/2000 milind Initial implementation of isomalloc-based
threads. This implementation uses a fixed
stack size for all threads (can be set at
runtime.)
08/26/2000 milind Added a macro CtvAccessOther that lets you
get/set a Ctv variable of any thread. It
should be invoked as CtvAccessOther(thread,
varname); Added CthGetData function to each of
the threads implementation. This function is
used in the CtvAccessOther macro.
08/27/2000 milind FEM Framework: Separated mesh to graph
conversion capability into a separate program.
This way, the generated graph can be partitioned
repeatedly.
09/04/2000 milind Added the class static readonly variables to
ci file syntax.
09/05/2000 milind FEM Framework: A very fast O(n) algorithm for
mesh2graph , uses more memory, but the tradeoff
was worth it. Coded by Karthik Mahesh, minor
optimizations by Milind.
09/05/2000 milind Added a barebones charm kernel scheduling
overhead measurement program.
09/15/2000 milind Added pup support for AMPI and FEM framework.
09/20/2000 olawlor Added capability to have an array of base type
where individual element could be of derived
types.
10/03/2000 gzheng New supported platform: net-linux-axp
10/05/2000 skumar2 Added program littleMD to the test suite.
10/07/2000 skumar2 New job scheduler (Faucets projects).
10/15/2000 milind Improved support for Fortran90 in charmc.
11/04/2000 jdesouza Made the Faucets scheduler multi-threaded.
11/05/2000 olawlor FEM Framework: supports multiple element types,
mesh re-assembly, etc.
11/15/2000 gzheng New platform support: net-cygwin
11/18/2000 gzheng conv-host no longer needs /bin/csh to start
remote program. set
CMK_CONV_HOST_CSH_UNAVAILABLE to 1 to use
/bin/sh instead.
11/25/2000 milind Finished experimental implementation of
converse-threads based on co-operative pthreads.
11/25/2000 milind Added a benchmark suite of all pingpongs in
Charm++.
11/28/2000 milind Removed deletion of _idx at the end of every
send or doneInserting call. Instead now it is
in the destructor of the proxy. This allows us
to cache proxies, when proxy creation becomes
a bottleneck.
11/28/2000 olawlor Added "seek blocks" to puppers. This should
allow out-of-order pup'ing without the ugliness
of getBuf; and in a way that works with all
PUP::ers.
11/29/2000 olawlor Simplified and regularized command-line-argument
handling.
11/29/2000 milind AMPI: Added multiple-communicators capability.
12/05/2000 gzheng Now /bin/sh is default shell to fork node
program on remote machines.
12/13/2000 olawlor Added charmrun wrapper for poe on mpi-sp.
12/14/2000 milind Added bluegene emulator sources and test
programs. Added "bluegene" as a language known
to charmc. Makefile now has a target called
bluegene. Added preliminary bluegene
documentation. (copied from Arun's webpage.)
12/15/2000 gzheng f90charm addition to Makefile and charmc. Also,
added fixed size arrays support to f90charm. A
test program f90charm/hello is checked in.
12/17/2000 milind Added rtest test program. Contributed by jim to
test Converse message transmission.
12/20/2000 olawlor Added charmconfig script. Enables automatic
determination of C++ compiler properties,
replacing the verbose and error-prone
conv-mach.h entries for CMK_BOOL,
CMK_STL_USE_DOT_H, CMK_CPP_CAST_OK, ...
12/20/2000 olawlor Charm++ Arrays optimizations: Key and object
now variable-length fields, instead of pointers.
This extra flexibility lets us save many
dynamic allocations in the array framework.
12/20/2000 olawlor Added PUP::able support-- dynamic type
identification, allocation, and deletion.
Allows you to write: p(objPtr); and
objPointer will be properly identified,
allocated, packed, and deallocated (depending
on the PUP::er). Requires you to register any
such classes with DECLARE_PUPable and
DEFINE_PUPable.
12/20/2000 olawlor Arrays optimizations: Made CkArrayIndex
fixed-size. This significantly improves
messaging speed (7 us instead of 10 us
roundtrip). Move spring cleaning check into a
CcdCallFnAfter, which gains more speed (down to
4 us roundtrip).
12/20/2000 olawlor More optimizations: Minor speed tweaks--
conv-ccs.c uses hashtable for handler lookup;
conv-conds skips timer test until needed;
convcore.c scheduler loop optmizations (no
superfluous EndIdle calls); threads.c
CMK_OPTIMIZE-> no mprotect.
12/20/2000 olawlor More Optimizations: Minor speed tweaks-- ck.C
groups cldEnqueue skip; init.h defines
CkLocalBranch inline; and supporting changes.
12/22/2000 gzheng IA64 support for Converse user level threads.
01/02/2001 olawlor CCS: Minor update-- enabled CcsProbe, cleaned
up superflous debug messages in server, added
Java interface (originally written for
AppSpecter).
01/09/2001 gzheng charmconfig converted to autoconf style, need
to change configure.in and conv-autoconfig.h.in,
and run autoconf to get configure and copy to
charmconfig. added fortran subroutine name
test and get libpthread.a
01/10/2001 milind Added telnet method of getting libpthread.a
from charm webserver.
01/11/2001 olawlor Moved projections files here from
CVSROOT/projections-java. Added fast Java
versions of the .log file input routines in
LogReader, LogLoader, LogAnalyzer, and
UsageCalc. Added "U.java" user interface
utility file, allowing times to be input in
seconds, milliseconds, or microseconds,
instead of just microseconds.
01/15/2001 gzheng add +trace-root to specify the directory to
put log files in. this is need in Scyld cluster
where there is no NFS mounting and no i/o
access to home directory sharing on nodes.
01/15/2001 milind Made AMPI into a f90 module instead of
'ampif.h' inclusion. AMPI f90 bindings are
now more inclusive. Fixed argc,argv handling
bugs in ArgsInfo message. Fixed a bug in pup
that caused thread not to be sized, but was
packed nevertheless. Moved irecv to waitall
instead of at in ampi_start. Made
AMPI_COMM_WORLD to be 0, because it clashed
with wildcard(-1). AMPI_COMM_UNIVERSE is now
handled properly in the AMPI module.
C/C++ data members are NOT visible to
Fortran 90.
01/18/2001 gzheng New supported platform: net-linux-scyld
01/20/2001 olawlor Moved array index field from CMessage_* to the
Ck envelope itself. This is the right thing
to do, because any message may be sent to/from
an array element. To reduce the wasted space
in a message, a union is used to overlay the
fields for the various possible message types.
01/29/2001 olawlor Freed charmrun on net-* version from using
remote shell to fork off processes. One can now
use a daemon provided in the distribution.
02/07/2001 olawlor Added debugging support to puppers.
02/13/2000 gzheng Added ++local option to charmrun to start node
program locally without any daemon; fix the
hang program if you type wrong pgm name in
scyld version, and redirect all output to
/dev/null, otherwise all node program can send
its output to console in scyld. Also implemented ++local in net-win32 version.
02/26/2000 milind Changed the varsize syntax. Now one can specify
actual varsize arrays in the interface file
and have the translator generate alloc, pack
and unpack.
--------------------------------------------------------------------------------
Bug Fixes:
--------------------------------------------------------------------------------
10/29/1999 milind Replaced jmemcpy by memcpy in net versions, as
it was causing a bit to flip (bug reported
by jim.)
10/29/1999 milind Fixed multiline macros in all header files.
02/05/2000 milind Fixed linking errors by getting the order of
libraries right from the charmc command-line.
02/18/2000 paranjpy Fixed Charm++ initialization bug on SMPs.
02/21/2000 milind Fixed a context-switching bug in mipspro version
of QuickThreads.
02/25/2000 milind Charm++ interface translator was segfaulting
on interface file errors. Fixed that. Also,
added linenumbers to error messages.
03/02/2000 milind Made CCS work on SMPs.
03/07/2000 milind Made ConverseInit consistent with the manual on
Origin2000 version.
04/18/2000 milind Fixed a bug in CkWaitFuture, which was caching
a variable locally, while it was changed by
another thread.
05/04/2000 paranjpy Fixed argv deletion bug on net-win32-smp.
06/08/2000 milind sp3 version: changed optimization flags, which
where power2 processor-specific.
06/20/2000 milind mpi-* versions: Fixed ConverseExit since it was
not obeying the following statement in the MPI
standard: The user must ensure that all pending
communications involving a process completes
before the process calls MPI_FINALIZE.
07/05/2000 milind Fixed a nasty bug in charmc in the -cp option.
It used to append the name provided to -o flag
to the directory provided to the -cp flag.
Thus, -o ../pgm -cp ../bin options meant that
the pgm would be copied to ../bin/.., which is
not the expected behavior. This fix correctly
copies pgm to ../bin.
07/07/2000 milind Removed variable arg_myhome, as it was not
being used anywhere, and also, setting it was
causing problems of env var HOME was not set.
07/27/2000 milind thishandle for the arrayelement was not being
correctly set. Bug was reported by Neelam.
08/26/2000 milind Origin2000: Changed the page alignment to
reflect the mmap alignment. The mmap man page
specifically states that it is not the same as
page size.
09/02/2000 milind Fixed a bug in code generated for threaded
(void) entry methods of array elements. The
dummy message that is passed to that method in
a thread has to be deleted before calling the
object method, because upon object method's
return, the thread might have migrated.
09/03/2000 olawlor Minor fix-fixes: 1.) Change to LBObjid hash
function would fail for >4-int object indices.
Replaced with proper function, which also
preserves the 1-int case. 2.) Array element
sends must go via the message queue to prevent
stack build-up for deep single-processor call
chains. These might happen, e.g., in a driver