diff --git a/MLOPT-intro06a/MLOPT-intro06a.pdf b/MLOPT-intro06a/MLOPT-intro06a.pdf new file mode 100644 index 0000000..97d6fbb Binary files /dev/null and b/MLOPT-intro06a/MLOPT-intro06a.pdf differ diff --git a/MLOPT-intro06a/info.json b/MLOPT-intro06a/info.json new file mode 100644 index 0000000..6b02eee --- /dev/null +++ b/MLOPT-intro06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "The fields of machine learning and mathematical\nprogramming are increasingly intertwined. Optimization problems\nlie at the heart of most machine learning approaches. The Special\nTopic on Machine Learning and Large Scale Optimization examines\nthis interplay. Machine learning researchers have embraced the\nadvances in mathematical programming allowing new types of models\nto be pursued. The special topic includes models using quadratic,\nlinear, second-order cone, semi-definite, and semi-infinite\nprograms. We observe that the qualities of good optimization\nalgorithms from the machine learning and optimization perspectives\ncan be quite different. Mathematical programming puts a premium on\naccuracy, speed, and robustness. Since generalization is the\nbottom line in machine learning and training is normally done\noff-line, accuracy and small speed improvements are of little\nconcern in machine learning. Machine learning prefers simpler\nalgorithms that work in reasonable computational time for\nspecific classes of problems. Reducing machine learning problems\nto well-explored mathematical programming classes with robust\ngeneral purpose optimization codes allows machine learning\nresearchers to rapidly develop new techniques. In turn, machine\nlearning presents new challenges to mathematical programming. The\nspecial issue include papers from two primary themes: novel\nmachine learning models and novel optimization approaches for\nexisting models. Many papers blend both themes, making small\nchanges in the underlying core mathematical program that enable\nthe develop of effective new algorithms.", + "authors": [ + "Kristin P. Bennett", + "Emilio Parrado-Hern{{\\'a}}ndez" + ], + "id": "MLOPT-intro06a", + "issue": 45, + "pages": [ + 1265, + 1281 + ], + "title": "The Interplay of Optimization and Machine Learning Research", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/MLSEC-intro06a/MLSEC-intro06a.pdf b/MLSEC-intro06a/MLSEC-intro06a.pdf new file mode 100644 index 0000000..5b9771a Binary files /dev/null and b/MLSEC-intro06a/MLSEC-intro06a.pdf differ diff --git a/MLSEC-intro06a/info.json b/MLSEC-intro06a/info.json new file mode 100644 index 0000000..5de7ced --- /dev/null +++ b/MLSEC-intro06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "The prevalent use of computers and internet has enhanced the quality\nof life for many people, but it has also attracted undesired attempts\nto undermine these systems. This special topic contains several\nresearch studies on how machine learning algorithms can help improve\nthe security of computer systems.", + "authors": [ + "Philip K. Chan", + "Richard P. Lippmann" + ], + "id": "MLSEC-intro06a", + "issue": 95, + "pages": [ + 2669, + 2672 + ], + "title": "Machine Learning for Computer Security", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/abbeel06a/abbeel06a.pdf b/abbeel06a/abbeel06a.pdf new file mode 100644 index 0000000..78d7c18 Binary files /dev/null and b/abbeel06a/abbeel06a.pdf differ diff --git a/abbeel06a/info.json b/abbeel06a/info.json new file mode 100644 index 0000000..d4a505e --- /dev/null +++ b/abbeel06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We study the computational and sample complexity of parameter and\nstructure learning in graphical models. Our main result shows that\nthe class of factor graphs with bounded degree can be learned in\npolynomial time and from a polynomial number of training examples,\nassuming that the data is generated by a network in this class. This\nresult covers both parameter estimation for a known network structure\nand structure learning. It implies as a corollary that we can learn\nfactor graphs for both Bayesian networks and Markov networks of\nbounded degree, in polynomial time and sample complexity. Importantly,\nunlike standard maximum likelihood estimation algorithms, our method\ndoes not require inference in the underlying network, and so applies\nto networks where inference is intractable. We also show that the\nerror of our learned model degrades gracefully when the generating\ndistribution is not a member of the target class of networks. In\naddition to our main result, we show that the sample complexity of\nparameter learning in graphical models has an O(1) dependence\non the number of variables in the model when using the KL-divergence\nnormalized by the number of variables as the performance criterion.", + "authors": [ + "Pieter Abbeel", + "Daphne Koller", + "Andrew Y. Ng" + ], + "id": "abbeel06a", + "issue": 63, + "pages": [ + 1743, + 1788 + ], + "title": "Learning Factor Graphs in Polynomial Time and Sample Complexity", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/angluin06a/angluin06a.pdf b/angluin06a/angluin06a.pdf new file mode 100644 index 0000000..df8e2cb Binary files /dev/null and b/angluin06a/angluin06a.pdf differ diff --git a/angluin06a/info.json b/angluin06a/info.json new file mode 100644 index 0000000..0c4c1d5 --- /dev/null +++ b/angluin06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We consider the problem of learning a hypergraph using edge-detecting queries.\nIn this model, the learner may query whether a set of vertices induces an \nedge of the hidden hypergraph or not.\nWe show that an r-uniform hypergraph with m edges and n \nvertices is learnable with O(24rm · \npoly(r,logn)) queries with high probability.\nThe queries can be made in O(min(2r \n(log m+r)2, (log m+r)3)) rounds.\nWe also give an algorithm that learns an almost uniform hypergraph of \ndimension r using O(2O((1+Δ/2)r) · \nm1+Δ/2 · poly(log n)) \nqueries with high probability,\nwhere Δ is the difference between the maximum and the minimum edge \nsizes. This upper bound matches our lower bound of \nΩ((m/(1+Δ/2))1+Δ/2) for this \nclass of hypergraphs in terms of dependence on m.\nThe queries can also be made in \nO((1+Δ) · min(2r (log m+r)2, \n(log m+r)3)) rounds.", + "authors": [ + "Dana Angluin", + "Jiang Chen" + ], + "id": "angluin06a", + "issue": 78, + "pages": [ + 2215, + 2236 + ], + "title": "Learning a Hidden Hypergraph", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bach06a/bach06a.pdf b/bach06a/bach06a.pdf new file mode 100644 index 0000000..878c185 Binary files /dev/null and b/bach06a/bach06a.pdf differ diff --git a/bach06a/info.json b/bach06a/info.json new file mode 100644 index 0000000..b2133d2 --- /dev/null +++ b/bach06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "Receiver Operating Characteristic (ROC) curves are a standard way to\ndisplay the performance of a set of binary classifiers for all\nfeasible ratios of the costs associated with false positives and\nfalse negatives. For linear classifiers, the set of classifiers is\ntypically obtained by training once, holding constant the estimated\nslope and then varying the intercept to obtain a parameterized set\nof classifiers whose performances can be plotted in the ROC plane.\nWe consider the alternative of varying the asymmetry of the cost\nfunction used for training. We show that the ROC curve obtained by\nvarying both the intercept and the asymmetry, and hence the slope,\nalways outperforms the ROC curve obtained by varying only the\nintercept. In addition, we present a path-following algorithm for\nthe support vector machine (SVM) that can compute efficiently the\nentire ROC curve, and that has the same computational complexity as\ntraining a single classifier. Finally, we provide a theoretical\nanalysis of the relationship between the asymmetric cost model\nassumed when training a classifier and the cost model assumed in\napplying the classifier. In particular, we show that the mismatch\nbetween the step function used for testing and its convex upper\nbounds, usually used for training, leads to a provable and\nquantifiable difference around extreme asymmetries.", + "authors": [ + "Francis R. Bach", + "David Heckerman", + "Eric Horvitz" + ], + "id": "bach06a", + "issue": 62, + "pages": [ + 1713, + 1741 + ], + "title": "Considering Cost Asymmetry in Learning Classifiers", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bach06b/bach06b.pdf b/bach06b/bach06b.pdf new file mode 100644 index 0000000..36e6ae4 Binary files /dev/null and b/bach06b/bach06b.pdf differ diff --git a/bach06b/info.json b/bach06b/info.json new file mode 100644 index 0000000..a4f3ebc --- /dev/null +++ b/bach06b/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Spectral clustering refers to a class of techniques which rely on\nthe eigenstructure of a similarity matrix to partition points into\ndisjoint clusters, with points in the same cluster having high\nsimilarity and points in different clusters having low similarity.\nIn this paper, we derive new cost functions for spectral\nclustering based on measures of error between a given partition\nand a solution of the spectral relaxation of a minimum normalized\ncut problem. Minimizing these cost functions with respect to the\npartition leads to new spectral clustering algorithms. Minimizing\nwith respect to the similarity matrix leads to algorithms for\nlearning the similarity matrix from fully labelled data sets. We\napply our learning algorithm to the blind one-microphone speech\nseparation problem, casting the problem as one of segmentation\nof the spectrogram.", + "authors": [ + "Francis R. Bach", + "Michael I. Jordan" + ], + "id": "bach06b", + "issue": 70, + "pages": [ + 1963, + 2001 + ], + "title": "Learning Spectral Clustering, With Application To Speech Separation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/barber06a/barber06a.pdf b/barber06a/barber06a.pdf new file mode 100644 index 0000000..0c87d2a Binary files /dev/null and b/barber06a/barber06a.pdf differ diff --git a/barber06a/info.json b/barber06a/info.json new file mode 100644 index 0000000..8ffe733 --- /dev/null +++ b/barber06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "We introduce a method for approximate smoothed inference in a class\nof switching linear dynamical systems, based on a novel form of\nGaussian Sum smoother. This class includes the switching Kalman\n'Filter' and the more general case of switch transitions dependent\non the continuous latent state. The method improves on the standard\nKim smoothing approach by dispensing with one of the key\napproximations, thus making fuller use of the available future\ninformation.\nWhilst the central assumption required is projection to a mixture of\nGaussians, we show that an additional conditional independence\nassumption results in a simpler but accurate alternative. Our method\nconsists of a single Forward and Backward Pass and is reminiscent of\nthe standard smoothing 'correction' recursions in the simpler linear\ndynamical system. The method is numerically stable and compares\nfavourably against alternative approximations, both in cases where a\nsingle mixture component provides a good posterior approximation,\nand where a multimodal approximation is required.", + "authors": [ + "David Barber" + ], + "id": "barber06a", + "issue": 88, + "pages": [ + 2515, + 2540 + ], + "title": "Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/begleiter06a/begleiter06a.pdf b/begleiter06a/begleiter06a.pdf new file mode 100644 index 0000000..cd84f8b Binary files /dev/null and b/begleiter06a/begleiter06a.pdf differ diff --git a/begleiter06a/info.json b/begleiter06a/info.json new file mode 100644 index 0000000..19543e3 --- /dev/null +++ b/begleiter06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We present worst case bounds for the learning\nrate of a known prediction method that is based on hierarchical\napplications of binary context tree weighting (CTW) predictors. A\nheuristic application of this approach that relies on Huffman's alphabet\ndecomposition is known to achieve state-of-the-art performance\nin prediction and lossless compression benchmarks. We show that our\nnew bound for this heuristic is tighter than the best known\nperformance guarantees for prediction and lossless compression\nalgorithms in various settings. This result\nsubstantiates the efficiency of this hierarchical method and provides a compelling\nexplanation for its practical success.\nIn addition, we present the results of a few experiments that\nexamine other possibilities for improving the multi-alphabet\nprediction performance of CTW-based algorithms.", + "authors": [ + "Ron Begleiter", + "Ran El-Yaniv" + ], + "id": "begleiter06a", + "issue": 12, + "pages": [ + 379, + 411 + ], + "title": "Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/belkin06a/belkin06a.pdf b/belkin06a/belkin06a.pdf new file mode 100644 index 0000000..9450869 Binary files /dev/null and b/belkin06a/belkin06a.pdf differ diff --git a/belkin06a/info.json b/belkin06a/info.json new file mode 100644 index 0000000..7b862a8 --- /dev/null +++ b/belkin06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We propose a family of learning algorithms based on a new form of\nregularization that allows us to exploit the geometry of the marginal\ndistribution. We focus on a semi-supervised framework that\nincorporates labeled and unlabeled data in a general-purpose learner.\nSome transductive graph learning algorithms and standard methods\nincluding support vector machines and regularized least squares can be\nobtained as special cases. We use properties of reproducing kernel\nHilbert spaces to prove new Representer theorems that provide\ntheoretical basis for the algorithms. As a result (in contrast to\npurely graph-based approaches) we obtain a natural out-of-sample\nextension to novel examples and so are able to handle both\ntransductive and truly semi-supervised settings. We present\nexperimental evidence suggesting that our semi-supervised algorithms\nare able to use unlabeled data effectively. Finally we have a brief\ndiscussion of unsupervised and fully supervised learning within our\ngeneral framework.", + "authors": [ + "Mikhail Belkin", + "Partha Niyogi", + "Vikas Sindhwani" + ], + "id": "belkin06a", + "issue": 84, + "pages": [ + 2399, + 2434 + ], + "title": "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bergkvist06a/bergkvist06a.pdf b/bergkvist06a/bergkvist06a.pdf new file mode 100644 index 0000000..99cd1b9 Binary files /dev/null and b/bergkvist06a/bergkvist06a.pdf differ diff --git a/bergkvist06a/info.json b/bergkvist06a/info.json new file mode 100644 index 0000000..e2b541c --- /dev/null +++ b/bergkvist06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We consider an optimization problem in probabilistic inference: Given\nn hypotheses Hj, m possible \nobservations Ok, their\nconditional probabilities pkj, and a particular \nOk, select a\npossibly small subset of hypotheses excluding the true target only\nwith some error probability ε. After specifying the\noptimization goal we show that this problem can be solved through a\nlinear program in mn variables that indicate the probabilities to\ndiscard a hypothesis given an observation. Moreover, we can compute\noptimal strategies where only O(m+n) of these variables get\nfractional values. The manageable size of the linear programs and the\nmostly deterministic shape of optimal strategies makes the method\npracticable. We interpret the dual variables as worst-case\ndistributions of hypotheses, and we point out some counterintuitive\nnonmonotonic behaviour of the variables as a function of the error\nbound ε. One of the open problems is the existence of a\npurely combinatorial algorithm that is faster than generic linear\nprogramming.", + "authors": [ + "Anders Bergkvist", + "Peter Damaschke", + "Marcel L{{\\\"u}}thi" + ], + "id": "bergkvist06a", + "issue": 48, + "pages": [ + 1339, + 1355 + ], + "title": "Linear Programs for Hypotheses Selection in Probabilistic Inference Models", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bhatnagar06a/bhatnagar06a.pdf b/bhatnagar06a/bhatnagar06a.pdf new file mode 100644 index 0000000..66e52cd Binary files /dev/null and b/bhatnagar06a/bhatnagar06a.pdf differ diff --git a/bhatnagar06a/info.json b/bhatnagar06a/info.json new file mode 100644 index 0000000..69f9b87 --- /dev/null +++ b/bhatnagar06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We study the problem of long-run average cost control of Markov chains\nconditioned on a rare event. In a related recent work, a simulation\nbased algorithm for estimating performance measures associated with a\nMarkov chain conditioned on a rare event has been developed. We extend\nideas from this work and develop an adaptive algorithm for obtaining,\nonline, optimal control policies conditioned on a rare event. Our\nalgorithm uses three timescales or step-size schedules. On the slowest\ntimescale, a gradient search algorithm for policy updates that is\nbased on one-simulation simultaneous perturbation stochastic\napproximation (SPSA) type estimates is used. Deterministic\nperturbation sequences obtained from appropriate normalized Hadamard\nmatrices are used here. The fast timescale recursions compute the\nconditional transition probabilities of an associated chain by\nobtaining solutions to the multiplicative Poisson equation (for a\ngiven policy estimate). Further, the risk parameter associated with\nthe value function for a given policy estimate is updated on a\ntimescale that lies in between the two scales above. We briefly sketch\nthe convergence analysis of our algorithm and present a numerical\napplication in the setting of routing multiple flows in communication\nnetworks.", + "authors": [ + "Shalabh Bhatnagar", + "Vivek S. Borkar", + "Madhukar Akarapu" + ], + "id": "bhatnagar06a", + "issue": 69, + "pages": [ + 1937, + 1962 + ], + "title": "A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bickel06a/bickel06a.pdf b/bickel06a/bickel06a.pdf new file mode 100644 index 0000000..83c8e5e Binary files /dev/null and b/bickel06a/bickel06a.pdf differ diff --git a/bickel06a/info.json b/bickel06a/info.json new file mode 100644 index 0000000..38d5e33 --- /dev/null +++ b/bickel06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We give a review of various aspects of boosting, clarifying the\nissues through a few simple results, and relate our work and that of\nothers to the minimax paradigm of statistics. We consider the\npopulation version of the boosting algorithm and prove its\nconvergence to the Bayes classifier as a corollary of a general\nresult about Gauss-Southwell optimization in Hilbert space. We then\ninvestigate the algorithmic convergence of the sample version, and\ngive bounds to the time until perfect separation of the sample. We\nconclude by some results on the statistical optimality of the L2\nboosting.", + "authors": [ + "Peter J. Bickel", + "Ya'acov Ritov", + "Alon Zakai" + ], + "id": "bickel06a", + "issue": 24, + "pages": [ + 705, + 732 + ], + "title": "Some Theory for Generalized Boosting Algorithms", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/blanchard06a/blanchard06a.pdf b/blanchard06a/blanchard06a.pdf new file mode 100644 index 0000000..94eec89 Binary files /dev/null and b/blanchard06a/blanchard06a.pdf differ diff --git a/blanchard06a/info.json b/blanchard06a/info.json new file mode 100644 index 0000000..b71ed06 --- /dev/null +++ b/blanchard06a/info.json @@ -0,0 +1,19 @@ +{ + "abstract": "Finding non-Gaussian components of high-dimensional data is an\nimportant preprocessing step for efficient information processing.\nThis article proposes a new linear method to identify the\n\"non-Gaussian subspace\" within a very general semi-parametric\nframework. Our proposed method, called NGCA (non-Gaussian component\nanalysis), is based on a linear operator which, to any arbitrary\nnonlinear (smooth) function, associates a vector belonging to the\nlow dimensional non-Gaussian target subspace, up to an estimation\nerror. By applying this operator to a family of different nonlinear\nfunctions, one obtains a family of different vectors lying in a\nvicinity of the target space. As a final step, the target space\nitself is estimated by applying PCA to this family of vectors. We\nshow that this procedure is consistent in the sense that the\nestimaton error tends to zero at a parametric rate, uniformly over\nthe family, Numerical examples demonstrate the usefulness of our\nmethod.", + "authors": [ + "Gilles Blanchard", + "Motoaki Kawanabe", + "Masashi Sugiyama", + "Vladimir Spokoiny", + "Klaus-Robert M{{\\\"u}}ller" + ], + "id": "blanchard06a", + "issue": 8, + "pages": [ + 247, + 282 + ], + "title": "In Search of Non-Gaussian Components of a High-Dimensional Distribution", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/bratko06a/bratko06a.pdf b/bratko06a/bratko06a.pdf new file mode 100644 index 0000000..16d433e Binary files /dev/null and b/bratko06a/bratko06a.pdf differ diff --git a/bratko06a/info.json b/bratko06a/info.json new file mode 100644 index 0000000..a0ecf54 --- /dev/null +++ b/bratko06a/info.json @@ -0,0 +1,19 @@ +{ + "abstract": "Spam filtering poses a special problem in text categorization, of\nwhich the defining characteristic is that filters face an active\nadversary, which constantly attempts to evade filtering. Since spam\nevolves continuously and most practical applications are based on\nonline user feedback, the task calls for fast, incremental and robust\nlearning algorithms. In this paper, we investigate a novel approach to\nspam filtering based on adaptive statistical data compression\nmodels. The nature of these models allows them to be employed as\nprobabilistic text classifiers based on character-level or binary\nsequences. By modeling messages as sequences, tokenization and other\nerror-prone preprocessing steps are omitted altogether, resulting in a\nmethod that is very robust. The models are also fast to construct and\nincrementally updateable. We evaluate the filtering performance of two\ndifferent compression algorithms; dynamic Markov compression and\nprediction by partial matching. The results of our empirical\nevaluation indicate that compression models outperform currently\nestablished spam filters, as well as a number of methods proposed in\nprevious studies.", + "authors": [ + "Andrej Bratko", + "Gordon V. Cormack", + "Bogdan Filipič", + "Thomas R. Lynam", + "Bla{\\v{z}} Zupan" + ], + "id": "bratko06a", + "issue": 96, + "pages": [ + 2673, + 2698 + ], + "title": "Spam Filtering Using Statistical Data Compression Models", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/braun06a/braun06a.pdf b/braun06a/braun06a.pdf new file mode 100644 index 0000000..fc09a3a Binary files /dev/null and b/braun06a/braun06a.pdf differ diff --git a/braun06a/info.json b/braun06a/info.json new file mode 100644 index 0000000..b1e4707 --- /dev/null +++ b/braun06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "The eigenvalues of the kernel matrix play an important role in a\nnumber of kernel methods, in particular, in kernel principal component\nanalysis. It is well known that the eigenvalues of the kernel matrix\nconverge as the number of samples tends to infinity. We derive\nprobabilistic finite sample size bounds on the approximation error of\nindividual eigenvalues which have the important property that the\nbounds scale with the eigenvalue under consideration, reflecting the\nactual behavior of the approximation errors as predicted by asymptotic\nresults and observed in numerical simulations. Such scaling bounds\nhave so far only been known for tail sums of eigenvalues.\nAsymptotically, the bounds presented here have a slower than\nstochastic rate, but the number of sample points necessary to make\nthis disadvantage noticeable is often unrealistically large.\nTherefore, under practical conditions, and for all but the largest few\neigenvalues, the bounds presented here form a significant improvement\nover existing non-scaling bounds.", + "authors": [ + "Mikio L. Braun" + ], + "id": "braun06a", + "issue": 81, + "pages": [ + 2303, + 2328 + ], + "title": "Accurate Error Bounds for the Eigenvalues of the Kernel Matrix", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/buehlmann06a/buehlmann06a.pdf b/buehlmann06a/buehlmann06a.pdf new file mode 100644 index 0000000..7bb146e Binary files /dev/null and b/buehlmann06a/buehlmann06a.pdf differ diff --git a/buehlmann06a/info.json b/buehlmann06a/info.json new file mode 100644 index 0000000..d161e09 --- /dev/null +++ b/buehlmann06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nWe propose Sparse Boosting (the SparseL2Boost algorithm), \na variant on \nboosting with the squared error loss. SparseL2Boost yields sparser\nsolutions than the previously proposed L2Boosting by \nminimizing some penalized L2-loss functions, the \nFPE model selection criteria, through small-step gradient descent. \nAlthough boosting \nmay give already relatively sparse solutions, for example corresponding to the\nsoft-thresholding estimator in orthogonal linear models, there is sometimes\na desire for more sparseness to increase prediction accuracy and ability\nfor better variable selection: such goals can be achieved with\nSparseL2Boost. \n

\n

\nWe prove an equivalence of SparseL2Boost to \nBreiman's nonnegative garrote\nestimator for orthogonal linear models and demonstrate the generic\nnature of SparseL2Boost for nonparametric interaction modeling. \nFor an automatic selection of the tuning parameter\nin SparseL2Boost we propose to employ the \ngMDL model selection criterion \nwhich can also be used for early stopping of L2Boosting. \nConsequently, we can select between SparseL2Boost \nand L2Boosting by comparing their gMDL scores.\n

", + "authors": [ + "Peter B{{\\\"u}}hlmann", + "Bin Yu" + ], + "id": "buehlmann06a", + "issue": 35, + "pages": [ + 1001, + 1024 + ], + "title": "Sparse Boosting", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/caponnetto06a/caponnetto06a.pdf b/caponnetto06a/caponnetto06a.pdf new file mode 100644 index 0000000..dab6f1b Binary files /dev/null and b/caponnetto06a/caponnetto06a.pdf differ diff --git a/caponnetto06a/info.json b/caponnetto06a/info.json new file mode 100644 index 0000000..344157a --- /dev/null +++ b/caponnetto06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We study some stability properties of algorithms which minimize\n(or almost-minimize) empirical error over Donsker classes of\nfunctions. We show that, as the number n of samples grows, the\nL2-diameter of the set of almost-minimizers of empirical error\nwith tolerance ξ(n)=o(n-1/2) \nconverges to zero in\nprobability. Hence, even in the case of multiple minimizers of\nexpected error, as n increases it becomes less and less likely that\nadding a sample (or a number of samples) to the training set will\nresult in a large jump to a new hypothesis. Moreover, under some\nassumptions on the entropy of the class, along with an assumption\nof Komlos-Major-Tusnady type, we derive a power rate of decay for\nthe diameter of almost-minimizers. This rate, through an\napplication of a uniform ratio limit inequality, is shown to\ngovern the closeness of the expected errors of the\nalmost-minimizers. In fact, under the above assumptions, the\nexpected errors of almost-minimizers become closer with a rate strictly\nfaster than n-1/2.", + "authors": [ + "Andrea Caponnetto", + "Alexander Rakhlin" + ], + "id": "caponnetto06a", + "issue": 90, + "pages": [ + 2565, + 2583 + ], + "title": "Stability Properties of Empirical Risk Minimization over Donsker Classes", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/castelo06a/castelo06a.pdf b/castelo06a/castelo06a.pdf new file mode 100644 index 0000000..6e81c33 Binary files /dev/null and b/castelo06a/castelo06a.pdf differ diff --git a/castelo06a/info.json b/castelo06a/info.json new file mode 100644 index 0000000..2a3448c --- /dev/null +++ b/castelo06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Learning of large-scale networks of interactions from microarray\ndata is an important and challenging problem in bioinformatics. A\nwidely used approach is to assume that the available data constitute\na random sample from a multivariate distribution belonging to a\nGaussian graphical model. As a consequence, the prime objects of\ninference are full-order partial correlations which are\npartial correlations between two variables given the remaining ones.\nIn the context of microarray data the number of variables exceed the\nsample size and this precludes the application of traditional\nstructure learning procedures because a sampling version of\nfull-order partial correlations does not exist. In this paper we\nconsider limited-order partial correlations, these are\npartial correlations computed on marginal distributions of\nmanageable size, and provide a set of rules that allow one to assess\nthe usefulness of these quantities to derive the independence\nstructure of the underlying Gaussian graphical model. Furthermore,\nwe introduce a novel structure learning procedure based on a\nquantity, obtained from limited-order partial correlations, that we\ncall the non-rejection rate. The applicability and usefulness of\nthe procedure are demonstrated by both simulated and real data.", + "authors": [ + "Robert Castelo", + "Alberto Roverato" + ], + "id": "castelo06a", + "issue": 93, + "pages": [ + 2621, + 2650 + ], + "title": "A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/castillo06a/castillo06a.pdf b/castillo06a/castillo06a.pdf new file mode 100644 index 0000000..41586f8 Binary files /dev/null and b/castillo06a/castillo06a.pdf differ diff --git a/castillo06a/info.json b/castillo06a/info.json new file mode 100644 index 0000000..68892cb --- /dev/null +++ b/castillo06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "This paper introduces a learning method for two-layer feedforward\nneural networks based on sensitivity analysis, which uses a linear\ntraining algorithm for each of the two layers. First, random values\nare assigned to the outputs of the first layer; later, these initial\nvalues are updated based on sensitivity formulas, which use the\nweights in each of the layers; the process is repeated until\nconvergence. Since these weights are learnt solving a linear system\nof equations, there is an important saving in computational time.\nThe method also gives the local sensitivities of the least square\nerrors with respect to input and output data, with no extra\ncomputational cost, because the necessary information becomes\navailable without extra calculations. This method, called the\nSensitivity-Based Linear Learning Method, can also be used to\nprovide an initial set of weights, which significantly improves the\nbehavior of other learning algorithms. The theoretical basis for the\nmethod is given and its performance is illustrated by its\napplication to several examples in which it is compared with several\nlearning algorithms and well known data sets. The results have shown\na learning speed generally faster than other existing methods. In\naddition, it can be used as an initialization tool for other well\nknown methods with significant improvements.", + "authors": [ + "Enrique Castillo", + "Bertha Guijarro-Berdi{{\\~n}}as", + "Oscar Fontenla-Romero", + "Amparo Alonso-Betanzos" + ], + "id": "castillo06a", + "issue": 41, + "pages": [ + 1159, + 1182 + ], + "title": "A Very Fast Learning Method for Neural Networks Based on Sensitivity Analysis", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/centeno06a/centeno06a.pdf b/centeno06a/centeno06a.pdf new file mode 100644 index 0000000..12c7774 Binary files /dev/null and b/centeno06a/centeno06a.pdf differ diff --git a/centeno06a/info.json b/centeno06a/info.json new file mode 100644 index 0000000..f5b0987 --- /dev/null +++ b/centeno06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "In this paper we consider a novel Bayesian interpretation of Fisher's\ndiscriminant analysis. We relate Rayleigh's coefficient to a noise\nmodel that minimises a cost based on the most probable class centres\nand that abandons the 'regression to the labels' assumption used by\nother algorithms. Optimisation of the noise model yields a direction \nof discrimination equivalent to Fisher's discriminant, and with the\nincorporation of a prior we can apply Bayes' rule to infer the\nposterior distribution of the direction of\ndiscrimination. Nonetheless, we argue that an additional constraining\ndistribution has to be included if sensible results are to be\nobtained. Going further, with the use of a Gaussian process prior we\nshow the equivalence of our model to a regularised kernel Fisher's\ndiscriminant. A key advantage of our approach is the facility to\ndetermine kernel parameters and the regularisation coefficient through\nthe optimisation of the marginal log-likelihood of the data. An\nadded bonus of the new formulation is that it enables us to link the\nregularisation coefficient with the generalisation error.", + "authors": [ + "Tonatiuh Pe{{\\~n}}a Centeno", + "Neil D. Lawrence" + ], + "id": "centeno06a", + "issue": 15, + "pages": [ + 455, + 491 + ], + "title": "Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/cesa-bianchi06a/cesa-bianchi06a.pdf b/cesa-bianchi06a/cesa-bianchi06a.pdf new file mode 100644 index 0000000..dc928ad Binary files /dev/null and b/cesa-bianchi06a/cesa-bianchi06a.pdf differ diff --git a/cesa-bianchi06a/info.json b/cesa-bianchi06a/info.json new file mode 100644 index 0000000..2141ce1 --- /dev/null +++ b/cesa-bianchi06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "

\nWe study the problem of classifying data in a given taxonomy\nwhen classifications associated with multiple and/or partial paths\nare allowed.\nWe introduce a new algorithm that incrementally learns a\nlinear-threshold classifier for each node of the taxonomy.\nA hierarchical classification is obtained by evaluating\nthe trained node classifiers in a top-down fashion.\nTo evaluate classifiers in our multipath framework,\nwe define a new hierarchical loss function, the H-loss,\ncapturing the intuition that whenever a classification\nmistake is made on a node of the taxonomy, then no loss should\nbe charged for any additional mistake occurring in the subtree\nof that node.\n

\n

\nMaking no assumptions on the mechanism generating the data instances,\nand assuming a linear noise model for the labels,\nwe bound the H-loss of our on-line algorithm in terms of the H-loss\nof a reference classifier knowing the true parameters of the label-generating \nprocess.\nWe show that, in expectation, the excess cumulative H-loss grows at most\nlogarithmically in the length of the data sequence.\nFurthermore, our analysis reveals the precise dependence of the rate\nof convergence on the eigenstructure of the data each node observes.\n

\n

\nOur theoretical results are complemented by a number of experiments on texual\ncorpora. In these experiments we show that, after only one epoch of training,\nour algorithm performs much better than Perceptron-based hierarchical \nclassifiers, and reasonably close to a hierarchical support vector machine.\n

", + "authors": [ + "Nicol{{\\'o}} Cesa-Bianchi", + "Claudio Gentile", + "Luca Zaniboni" + ], + "id": "cesa-bianchi06a", + "issue": 1, + "pages": [ + 31, + 54 + ], + "title": "Incremental Algorithms for Hierarchical Classification", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/cesa-bianchi06b/cesa-bianchi06b.pdf b/cesa-bianchi06b/cesa-bianchi06b.pdf new file mode 100644 index 0000000..4cc12e2 Binary files /dev/null and b/cesa-bianchi06b/cesa-bianchi06b.pdf differ diff --git a/cesa-bianchi06b/info.json b/cesa-bianchi06b/info.json new file mode 100644 index 0000000..c9854de --- /dev/null +++ b/cesa-bianchi06b/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "A selective sampling algorithm is a learning algorithm for\nclassification that, based on the past observed data, decides whether\nto ask the label of each new instance to be classified. In this\npaper, we introduce a general technique for turning linear-threshold\nclassification algorithms from the general additive family into\nrandomized selective sampling algorithms. For the most popular\nalgorithms in this family we derive mistake bounds that hold for\nindividual sequences of examples. These bounds show that our\nsemi-supervised algorithms can achieve, on average, the same accuracy\nas that of their fully supervised counterparts, but using fewer\nlabels. Our theoretical results are corroborated by a number of\nexperiments on real-world textual data. The outcome of these\nexperiments is essentially predicted by our theoretical results: Our\nselective sampling algorithms tend to perform as well as the\nalgorithms receiving the true label after each classification, while\nobserving in practice substantially fewer labels.", + "authors": [ + "Nicol{{\\'o}} Cesa-Bianchi", + "Claudio Gentile", + "Luca Zaniboni" + ], + "id": "cesa-bianchi06b", + "issue": 43, + "pages": [ + 1205, + 1230 + ], + "title": "Worst-Case Analysis of Selective Sampling for Linear Classification", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/chang06a/chang06a.pdf b/chang06a/chang06a.pdf new file mode 100644 index 0000000..24e0a00 Binary files /dev/null and b/chang06a/chang06a.pdf differ diff --git a/chang06a/info.json b/chang06a/info.json new file mode 100644 index 0000000..78165d7 --- /dev/null +++ b/chang06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "In this paper, we propose a number of adaptive prototype learning\n(APL) algorithms. They employ the same algorithmic scheme to determine\nthe number and location of prototypes, but differ in the use of\nsamples or the weighted averages of samples as prototypes, and also in\nthe assumption of distance measures. To understand these algorithms\nfrom a theoretical viewpoint, we address their convergence properties,\nas well as their consistency under certain conditions. We also present\na soft version of APL, in which a non-zero training error is allowed\nin order to enhance the generalization power of the resultant\nclassifier. Applying the proposed algorithms to twelve UCI benchmark\ndata sets, we demonstrate that they outperform many instance-based\nlearning algorithms, the k-nearest neighbor rule, and support vector\nmachines in terms of average test accuracy.", + "authors": [ + "Fu Chang", + "Chin-Chin Lin", + "Chi-Jen Lu" + ], + "id": "chang06a", + "issue": 75, + "pages": [ + 2125, + 2148 + ], + "title": "Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/chen06a/chen06a.pdf b/chen06a/chen06a.pdf new file mode 100644 index 0000000..5d61b1f Binary files /dev/null and b/chen06a/chen06a.pdf differ diff --git a/chen06a/info.json b/chen06a/info.json new file mode 100644 index 0000000..96c321f --- /dev/null +++ b/chen06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "The consistency of classification algorithm plays a central role\nin statistical learning theory. A consistent algorithm guarantees\nus that taking more samples essentially suffices to roughly\nreconstruct the unknown distribution. We consider the consistency\nof ERM scheme over classes of combinations of very simple rules\n(base classifiers) in multiclass classification. Our approach is,\nunder some mild conditions, to establish a quantitative\nrelationship between classification errors and convex risks. In\ncomparison with the related previous work, the feature of our\nresult is that the conditions are mainly expressed in terms of the\ndifferences between some values of the convex function.", + "authors": [ + "Di-Rong Chen", + "Tao Sun" + ], + "id": "chen06a", + "issue": 85, + "pages": [ + 2435, + 2447 + ], + "title": "Consistency of Multiclass Empirical Risk Minimization Methods Based on Convex Loss", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/climer06a/climer06a.pdf b/climer06a/climer06a.pdf new file mode 100644 index 0000000..64bfc96 Binary files /dev/null and b/climer06a/climer06a.pdf differ diff --git a/climer06a/info.json b/climer06a/info.json new file mode 100644 index 0000000..f2675ae --- /dev/null +++ b/climer06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Given a matrix of values in which the rows correspond to objects and\nthe columns correspond to features of the objects, rearrangement\nclustering is the problem of rearranging the rows of the matrix such\nthat the sum of the similarities between adjacent rows is maximized.\nReferred to by various names and reinvented several \ntimes, this clustering technique has been\nextensively used in many fields over the last three decades. In this paper, we\npoint out two critical pitfalls that have been previously overlooked.\nThe first pitfall is deleterious when rearrangement clustering is applied to\nobjects that form natural clusters. The second concerns a\nsimilarity metric that is commonly used. We present an algorithm that\novercomes these pitfalls. This algorithm is based on a variation of\nthe Traveling\nSalesman Problem. It offers an extra benefit as it\nautomatically determines cluster boundaries. Using this algorithm, we\noptimally solve four\nbenchmark problems and a 2,467-gene expression data clustering\nproblem. As expected, our new algorithm identifies better clusters \nthan those found by previous\napproaches in all five cases. Overall, \nour results demonstrate the benefits\nof rectifying the pitfalls and exemplify the usefulness of this\nclustering technique. Our code is available at our\nwebsites.", + "authors": [ + "Sharlee Climer", + "Weixiong Zhang" + ], + "id": "climer06a", + "issue": 31, + "pages": [ + 919, + 943 + ], + "title": "Rearrangement Clustering: Pitfalls, Remedies, and Applications", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/collobert06a/collobert06a.pdf b/collobert06a/collobert06a.pdf new file mode 100644 index 0000000..14c6c9d Binary files /dev/null and b/collobert06a/collobert06a.pdf differ diff --git a/collobert06a/info.json b/collobert06a/info.json new file mode 100644 index 0000000..c94b5a5 --- /dev/null +++ b/collobert06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "We show how the concave-convex procedure can be applied\nto transductive SVMs, which traditionally require solving\na combinatorial search problem. This\nprovides for the first time a highly scalable algorithm in the nonlinear\ncase.\nDetailed experiments verify the utility of our approach. Software\nis available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction.html.", + "authors": [ + "Ronan Collobert", + "Fabian Sinz", + "Jason Weston", + "L{{\\'e}}on Bottou" + ], + "id": "collobert06a", + "issue": 61, + "pages": [ + 1687, + 1712 + ], + "title": "Large Scale Transductive SVMs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/crammer06a/crammer06a.pdf b/crammer06a/crammer06a.pdf new file mode 100644 index 0000000..83a2dc2 Binary files /dev/null and b/crammer06a/crammer06a.pdf differ diff --git a/crammer06a/info.json b/crammer06a/info.json new file mode 100644 index 0000000..6345a6a --- /dev/null +++ b/crammer06a/info.json @@ -0,0 +1,19 @@ +{ + "abstract": "We present a family of margin based online learning algorithms for various\nprediction tasks. In particular we derive and analyze algorithms for binary and\nmulticlass categorization, regression, uniclass prediction and sequence\nprediction. \nThe update steps of our different algorithms are all based on analytical\nsolutions to simple constrained optimization problems. This unified view\nallows us to prove worst-case loss bounds for the different algorithms and for\nthe various decision problems based on a single lemma. Our bounds on the\ncumulative loss of the algorithms are relative to the smallest loss that can be\nattained by any fixed hypothesis, and as such are applicable to both realizable\nand unrealizable settings. We demonstrate some of the merits of the proposed\nalgorithms in a series of experiments with synthetic and real data sets.", + "authors": [ + "Koby Crammer", + "Ofer Dekel", + "Joseph Keshet", + "Shai Shalev-Shwartz", + "Yoram Singer" + ], + "id": "crammer06a", + "issue": 18, + "pages": [ + 551, + 585 + ], + "title": "Online Passive-Aggressive Algorithms", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/debie06a/debie06a.pdf b/debie06a/debie06a.pdf new file mode 100644 index 0000000..2740fa1 Binary files /dev/null and b/debie06a/debie06a.pdf differ diff --git a/debie06a/info.json b/debie06a/info.json new file mode 100644 index 0000000..154fcb1 --- /dev/null +++ b/debie06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nThe rise of convex programming has changed the face of many research\nfields in recent years, machine learning being one of the ones that\nbenefitted the most. A very recent developement, the relaxation of\ncombinatorial problems to semi-definite programs (SDP), has gained\nconsiderable attention over the last decade (Helmberg, 2000; De Bie \nand Cristianini, 2004a).\nAlthough SDP problems can be solved in polynomial time, for many\nrelaxations the exponent in the polynomial complexity bounds is too\nhigh for scaling to large problem sizes. This has hampered their\nuptake as a powerful new tool in machine learning.\n

\nIn this paper, we present a new and fast SDP relaxation of the\nnormalized graph cut problem, and investigate its usefulness in\nunsupervised and semi-supervised learning. In particular, this\nprovides a convex algorithm for transduction, as well as approaches\nto clustering. We further propose a whole cascade of fast\nrelaxations that all hold the middle between older spectral\nrelaxations and the new SDP relaxation, allowing one to trade off\ncomputational cost versus relaxation accuracy. Finally, we discuss\nhow the methodology developed in this paper can be applied to other\ncombinatorial problems in machine learning, and we treat the max-cut\nproblem as an example.\n

", + "authors": [ + "Tijl De Bie", + "Nello Cristianini" + ], + "id": "debie06a", + "issue": 51, + "pages": [ + 1409, + 1436 + ], + "title": "Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/decampos06a/decampos06a.pdf b/decampos06a/decampos06a.pdf new file mode 100644 index 0000000..b36999c Binary files /dev/null and b/decampos06a/decampos06a.pdf differ diff --git a/decampos06a/info.json b/decampos06a/info.json new file mode 100644 index 0000000..c08e1bf --- /dev/null +++ b/decampos06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "We propose a new scoring function for learning Bayesian networks from\ndata using score+search algorithms. This is based on the concept of\nmutual information and exploits some well-known properties of this\nmeasure in a novel way. Essentially, a statistical independence test\nbased on the chi-square distribution, associated with the mutual\ninformation measure, together with a property of additive\ndecomposition of this measure, are combined in order to measure the\ndegree of interaction between each variable and its parent variables\nin the network. The result is a non-Bayesian scoring function called\nMIT (mutual information tests) which belongs to the family of scores\nbased on information theory. The MIT score also represents a\npenalization of the Kullback-Leibler divergence between the joint\nprobability distributions associated with a candidate network and with\nthe available data set. Detailed results of a complete experimental\nevaluation of the proposed scoring function and its comparison with\nthe well-known K2, BDeu and BIC/MDL scores are also presented.", + "authors": [ + "Luis M. de Campos" + ], + "id": "decampos06a", + "issue": 76, + "pages": [ + 2149, + 2187 + ], + "title": "A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/demsar06a/demsar06a.pdf b/demsar06a/demsar06a.pdf new file mode 100644 index 0000000..a9d621f Binary files /dev/null and b/demsar06a/demsar06a.pdf differ diff --git a/demsar06a/info.json b/demsar06a/info.json new file mode 100644 index 0000000..e04dd45 --- /dev/null +++ b/demsar06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "While methods for comparing two learning algorithms on a single\ndata set have been scrutinized for quite some time already, the\nissue of statistical tests for comparisons of more algorithms on\nmultiple data sets, which is even more essential to typical machine\nlearning studies, has been all but ignored. This article reviews\nthe current practice and then theoretically and empirically\nexamines several suitable tests. Based on that, we recommend a set\nof simple, yet safe and robust non-parametric tests for\nstatistical comparisons of classifiers: the Wilcoxon signed ranks\ntest for comparison of two classifiers and the Friedman test with\nthe corresponding post-hoc tests for comparison of more classifiers\nover multiple data sets. Results of the latter can also be neatly\npresented with the newly introduced CD (critical difference)\ndiagrams.", + "authors": [ + "Janez Dem{\\v{s}}ar" + ], + "id": "demsar06a", + "issue": 0, + "pages": [ + 1, + 30 + ], + "title": "Statistical Comparisons of Classifiers over Multiple Data Sets", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/ekdahl06a/ekdahl06a.pdf b/ekdahl06a/ekdahl06a.pdf new file mode 100644 index 0000000..0601566 Binary files /dev/null and b/ekdahl06a/ekdahl06a.pdf differ diff --git a/ekdahl06a/info.json b/ekdahl06a/info.json new file mode 100644 index 0000000..c0a53b9 --- /dev/null +++ b/ekdahl06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "In many pattern recognition/classification problem the true class\nconditional model and class probabilities are approximated for reasons\nof reducing complexity and/or of statistical estimation. The\napproximated classifier is expected to have worse performance, here\nmeasured by the probability of correct classification. We present an\nanalysis valid in general, and easily computable formulas for\nestimating the degradation in probability of correct classification\nwhen compared to the optimal classifier. An example of an\napproximation is the Naïve Bayes classifier. We show that the\nperformance of the Naïve Bayes depends on the degree of functional\ndependence between the features and labels. We provide a sufficient\ncondition for zero loss of performance, too.", + "authors": [ + "Magnus Ekdahl", + "Timo Koski" + ], + "id": "ekdahl06a", + "issue": 86, + "pages": [ + 2449, + 2480 + ], + "title": "Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/evendar06a/evendar06a.pdf b/evendar06a/evendar06a.pdf new file mode 100644 index 0000000..c22b0cc Binary files /dev/null and b/evendar06a/evendar06a.pdf differ diff --git a/evendar06a/info.json b/evendar06a/info.json new file mode 100644 index 0000000..d60c7a6 --- /dev/null +++ b/evendar06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We incorporate statistical confidence intervals in both the\nmulti-armed bandit and the reinforcement learning problems. In the\nbandit problem we show that given n arms, it suffices to pull the\narms a total of O((n2)log(1/δ)) times to\nfind an ε-optimal arm with probability of at least 1-δ.\nThis bound matches the lower bound of Mannor and Tsitsiklis (2004)\nup to constants. We also devise action elimination\nprocedures in reinforcement learning algorithms. We describe a\nframework that is based on learning the confidence interval around\nthe value function or the Q-function and eliminating actions that\nare not optimal (with high probability). We provide a model-based\nand a model-free variants of the elimination method. We further\nderive stopping conditions guaranteeing that the learned policy is\napproximately optimal with high probability. Simulations demonstrate\na considerable speedup and added robustness over ε-greedy\nQ-learning.", + "authors": [ + "Eyal Even-Dar", + "Shie Mannor", + "Yishay Mansour" + ], + "id": "evendar06a", + "issue": 38, + "pages": [ + 1079, + 1105 + ], + "title": "Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/fumera06a/fumera06a.pdf b/fumera06a/fumera06a.pdf new file mode 100644 index 0000000..75531c8 Binary files /dev/null and b/fumera06a/fumera06a.pdf differ diff --git a/fumera06a/info.json b/fumera06a/info.json new file mode 100644 index 0000000..29503cb --- /dev/null +++ b/fumera06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "In recent years anti-spam filters have become necessary tools for\nInternet service providers to face up to the continuously growing spam\nphenomenon. Current server-side anti-spam filters are made up of\nseveral modules aimed at detecting different features of spam e-mails.\nIn particular, text categorisation techniques have been investigated\nby researchers for the design of modules for the analysis of the\nsemantic content of e-mails, due to their potentially higher\ngeneralisation capability with respect to manually derived\nclassification rules used in current server-side filters. However,\nvery recently spammers introduced a new trick consisting of embedding\nthe spam message into attached images, which can make all current\ntechniques based on the analysis of digital text in the subject and\nbody fields of e-mails ineffective.\nIn this paper we propose an\napproach to anti-spam filtering which exploits the text information\nembedded into images sent as attachments. Our approach is based on\nthe application of state-of-the-art text categorisation techniques to\nthe analysis of text extracted by OCR tools from images attached to\ne-mails. The effectiveness of the proposed approach is experimentally\nevaluated on two large corpora of spam e-mails.", + "authors": [ + "Giorgio Fumera", + "Ignazio Pillai", + "Fabio Roli" + ], + "id": "fumera06a", + "issue": 97, + "pages": [ + 2699, + 2720 + ], + "title": "Spam Filtering Based On The Analysis Of Text Information Embedded Into Images", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/gardner06a/gardner06a.pdf b/gardner06a/gardner06a.pdf new file mode 100644 index 0000000..de9bdb7 Binary files /dev/null and b/gardner06a/gardner06a.pdf differ diff --git a/gardner06a/info.json b/gardner06a/info.json new file mode 100644 index 0000000..b62eb70 --- /dev/null +++ b/gardner06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "This paper describes an application of one-class support vector\nmachine (SVM) novelty detection for detecting seizures in humans. Our\ntechnique maps intracranial electroencephalogram (EEG) time series\ninto corresponding novelty sequences by classifying short-time,\nenergy-based statistics computed from one-second windows of data. We\ntrain a classifier on epochs of interictal (normal) EEG. During ictal\n(seizure) epochs of EEG, seizure activity induces distributional\nchanges in feature space that increase the empirical outlier\nfraction. A hypothesis test determines when the parameter change\ndiffers significantly from its nominal value, signaling a seizure\ndetection event. Outputs are gated in a .one-shot. manner using\npersistence to reduce the false alarm rate of the system. The detector\nwas validated using leave-one-out cross-validation (LOO-CV) on a\nsample of 41 interictal and 29 ictal epochs, and achieved 97.1%\nsensitivity, a mean detection latency of -7.58 seconds, and an\nasymptotic false positive rate (FPR) of 1.56 false positives per hour\n(Fp/hr). These results are better than those obtained from a novelty\ndetection technique based on Mahalanobis distance outlier detection,\nand comparable to the performance of a supervised learning technique\nused in experimental implantable devices (Echauz et al., 2001). The\nnovelty detection paradigm overcomes three significant limitations of\ncompeting methods: the need to collect seizure data, precisely mark\nseizure onset and offset times, and perform patient-specific parameter\ntuning for detector training.", + "authors": [ + "Andrew B. Gardner", + "Abba M. Krieger", + "George Vachtsevanos", + "Brian Litt" + ], + "id": "gardner06a", + "issue": 36, + "pages": [ + 1025, + 1044 + ], + "title": "One-Class Novelty Detection for Seizure Analysis from Intracranial EEG", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/glasmachers06a/glasmachers06a.pdf b/glasmachers06a/glasmachers06a.pdf new file mode 100644 index 0000000..817451f Binary files /dev/null and b/glasmachers06a/glasmachers06a.pdf differ diff --git a/glasmachers06a/info.json b/glasmachers06a/info.json new file mode 100644 index 0000000..431b5d0 --- /dev/null +++ b/glasmachers06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Support vector machines are trained by solving constrained\nquadratic\noptimization problems.\nThis is usually done with an iterative decomposition algorithm\noperating on a small working set of variables in every iteration.\nThe training time strongly depends on the selection of these\nvariables. We propose the maximum-gain working set selection\nalgorithm for large scale quadratic programming. It is based on the\nidea to greedily maximize the progress in each single iteration. The\nalgorithm takes second order information from cached kernel matrix\nentries into account. We prove the convergence to an optimal\nsolution of a variant termed hybrid maximum-gain working set\nselection. This method is empirically compared to the prominent\nmost violating pair selection and the latest algorithm using second\norder information. For large training sets our new selection scheme\nis significantly faster.", + "authors": [ + "Tobias Glasmachers", + "Christian Igel" + ], + "id": "glasmachers06a", + "issue": 52, + "pages": [ + 1437, + 1466 + ], + "title": "Maximum-Gain Working Set Selection for SVMs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/goldberg06a/goldberg06a.pdf b/goldberg06a/goldberg06a.pdf new file mode 100644 index 0000000..5155219 Binary files /dev/null and b/goldberg06a/goldberg06a.pdf differ diff --git a/goldberg06a/info.json b/goldberg06a/info.json new file mode 100644 index 0000000..a735c5b --- /dev/null +++ b/goldberg06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "

\nA classical approach in multi-class pattern classification is the\nfollowing. Estimate the probability distributions that generated the\nobservations for each label class, and then label new instances by\napplying the Bayes classifier to the estimated distributions. That\napproach provides more useful information than just a class label; it\nalso provides estimates of the conditional distribution of class\nlabels, in situations where there is class overlap.\n

\nWe would like to know whether it is harder to build accurate\nclassifiers via this approach, than by techniques that may process\nall data with distinct labels together. In this paper we make\nthat question precise by considering it in the context of PAC\nlearnability. We propose two restrictions on the PAC learning\nframework that are intended to correspond with the above approach,\nand consider their relationship with standard PAC learning.\nOur main restriction of interest leads to some interesting algorithms\nthat show that the restriction is not stronger (more restrictive)\nthan various other well-known restrictions on PAC learning.\nAn alternative slightly milder restriction turns out to be almost\nequivalent to unrestricted PAC learning.\n

", + "authors": [ + "Paul W. Goldberg" + ], + "id": "goldberg06a", + "issue": 9, + "pages": [ + 283, + 306 + ], + "title": "Some Discriminant-Based PAC Algorithms", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/hamerly06a/hamerly06a.pdf b/hamerly06a/hamerly06a.pdf new file mode 100644 index 0000000..b26204c Binary files /dev/null and b/hamerly06a/hamerly06a.pdf differ diff --git a/hamerly06a/info.json b/hamerly06a/info.json new file mode 100644 index 0000000..14bfc4e --- /dev/null +++ b/hamerly06a/info.json @@ -0,0 +1,19 @@ +{ + "abstract": "

\nAn essential step in designing a new computer architecture is the\ncareful examination of different design options. It is critical that\ncomputer architects have efficient means by which they may estimate\nthe impact of various design options on the overall machine. This\ntask is complicated by the fact that different programs, and even\ndifferent parts of the same program, may have distinct behaviors\nthat interact with the hardware in different ways. Researchers use\nvery detailed simulators to estimate processor performance, which\nmodels every cycle of an executing program. Unfortunately, simulating\nevery cycle of a real program can take weeks or months.\n

\nTo address this problem we have created a tool called SimPoint that\nuses data clustering algorithms from machine learning to automatically\nfind repetitive patterns in a program's execution. By simulating one\nrepresentative of each repetitive behavior pattern, simulation time\ncan be reduced to minutes instead of weeks for standard benchmark\nprograms, with very little cost in terms of accuracy. We describe this\nimportant problem, the data representation and preprocessing methods\nused by SimPoint, the clustering algorithm at the core of SimPoint,\nand we evaluate different options for tuning SimPoint.\n

", + "authors": [ + "Greg Hamerly", + "Erez Perelman", + "Jeremy Lau", + "Brad Calder", + "Timothy Sherwood" + ], + "id": "hamerly06a", + "issue": 11, + "pages": [ + 343, + 378 + ], + "title": "Using Machine Learning to Guide Architecture Simulation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/heiler06a/heiler06a.pdf b/heiler06a/heiler06a.pdf new file mode 100644 index 0000000..41fbe18 Binary files /dev/null and b/heiler06a/heiler06a.pdf differ diff --git a/heiler06a/info.json b/heiler06a/info.json new file mode 100644 index 0000000..e2254c5 --- /dev/null +++ b/heiler06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We exploit the biconvex nature of the Euclidean non-negative matrix\nfactorization (NMF) optimization problem to derive optimization\nschemes based on sequential quadratic and second order cone\nprogramming. We show that for ordinary NMF, our approach performs\nas well as existing state-of-the-art algorithms, while for\nsparsity-constrained NMF, as recently proposed by P. O. Hoyer in\nJMLR 5 (2004), it outperforms previous methods. In addition,\nwe show how to extend NMF learning within the same optimization\nframework in order to make use of class membership information in\nsupervised learning problems.", + "authors": [ + "Matthias Heiler", + "Christoph Schn{{\\\"o}}rr" + ], + "id": "heiler06a", + "issue": 50, + "pages": [ + 1385, + 1407 + ], + "title": "Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/huang06a/huang06a.pdf b/huang06a/huang06a.pdf new file mode 100644 index 0000000..255a4f1 Binary files /dev/null and b/huang06a/huang06a.pdf differ diff --git a/huang06a/info.json b/huang06a/info.json new file mode 100644 index 0000000..fd70233 --- /dev/null +++ b/huang06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "The Bradley-Terry model for obtaining individual skill from paired\ncomparisons has been popular in many areas. In machine learning, this\nmodel is related to multi-class probability estimates by coupling all\npairwise classification results. Error correcting output codes (ECOC)\nare a general framework to decompose a multi-class problem to several\nbinary problems. To obtain probability estimates under this framework,\nthis paper introduces a generalized Bradley-Terry model in which\npaired individual comparisons are extended to paired team comparisons.\nWe propose a simple algorithm with convergence proofs to solve the\nmodel and obtain individual skill. Experiments on synthetic and re al\ndata demonstrate that the algorithm is useful for obtaining\nmulti-class probability estimates. Moreover, we discuss four\nextensions of the proposed model: 1) weighted individual skill, 2)\nhome-field advantage, 3) ties, and 4) comparisons with more than two\nteams.", + "authors": [ + "Tzu-Kuo Huang", + "Ruby C. Weng", + "Chih-Jen Lin" + ], + "id": "huang06a", + "issue": 3, + "pages": [ + 85, + 115 + ], + "title": "Generalized Bradley-Terry Models and Multi-Class Probability Estimates", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/hush06a/hush06a.pdf b/hush06a/hush06a.pdf new file mode 100644 index 0000000..b1ae729 Binary files /dev/null and b/hush06a/hush06a.pdf differ diff --git a/hush06a/info.json b/hush06a/info.json new file mode 100644 index 0000000..d88b05b --- /dev/null +++ b/hush06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "We describe polynomial--time algorithms\nthat produce approximate solutions with guaranteed\naccuracy for a class of QP problems that are used in the\ndesign of support vector machine classifiers.\nThese algorithms employ a two--stage process where the\nfirst stage produces an approximate\nsolution to a dual QP problem and the second stage maps\nthis approximate dual solution to an approximate primal solution.\nFor the second stage we describe an O(n log n)\nalgorithm that maps an approximate dual solution with accuracy\n(2(2Km)1/2+8(λ)1/2)-2 \nλ εp2\nto an approximate primal solution with\naccuracy εp\nwhere n is the number of data samples,\nKn is the maximum kernel value over the data and\nλ > 0 is the SVM regularization parameter.\nFor the first stage we present new results\nfor decomposition algorithms and\ndescribe new decomposition algorithms with guaranteed\naccuracy and run time.\nIn particular, for τ-rate certifying decomposition algorithms\nwe establish the optimality of τ = 1/(n-1).\nIn addition\nwe extend the recent τ = 1/(n-1) algorithm of Simon\n(2004) to form two new composite algorithms\nthat also achieve the τ = 1/(n-1) iteration bound\nof List and Simon (2005), but yield faster run times in practice.\nWe also exploit the τ-rate certifying property of these\nalgorithms to produce new stopping rules that are computationally\nefficient and that guarantee a specified accuracy for the\napproximate dual solution.\nFurthermore,\nfor the dual QP problem corresponding to the standard classification\nproblem we describe operational conditions for which the Simon and composite\nalgorithms possess an upper bound of O(n) on the number of iterations.\nFor this same problem we also describe general conditions for which\na matching lower bound exists\nfor any decomposition algorithm that uses working sets of size 2.\nFor the Simon and composite algorithms we also establish an O(n2)\nbound on the overall run time for the first stage.\nCombining the first and second stages gives\nan overall run time of O(n2(ck + 1))\nwhere ck is an upper bound on the computation to perform\na kernel evaluation. Pseudocode is presented\nfor a complete algorithm that inputs an accuracy εp\nand produces an approximate solution that satisfies\nthis accuracy in low order polynomial time.\nExperiments are included to illustrate the new stopping rules and\nto compare the Simon and composite decomposition algorithms.", + "authors": [ + "Don Hush", + "Patrick Kelly", + "Clint Scovel", + "Ingo Steinwart" + ], + "id": "hush06a", + "issue": 25, + "pages": [ + 733, + 769 + ], + "title": "QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/jonsson06a/info.json b/jonsson06a/info.json new file mode 100644 index 0000000..431f8e2 --- /dev/null +++ b/jonsson06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We present Variable Influence Structure Analysis, or VISA, an\nalgorithm that performs hierarchical decomposition of factored\nMarkov decision processes.\nVISA uses a dynamic Bayesian network model of actions, and\nconstructs a causal graph that captures relationships between\nstate variables.\nIn tasks with sparse causal graphs VISA exploits structure by\nintroducing activities that cause the values of state variables\nto change.\nThe result is a hierarchy of activities that together represent a\nsolution to the original task.\nVISA performs state abstraction for each activity by\nignoring irrelevant state variables and lower-level activities.\nIn addition, we describe an algorithm for constructing compact\nmodels of the activities introduced.\nState abstraction and compact activity models enable VISA\nto apply efficient algorithms to solve the stand-alone subtask\nassociated with each activity.\nExperimental results show that the decomposition introduced by\nVISA can significantly accelerate construction of an optimal, or\nnear-optimal, policy.", + "authors": [ + "Anders Jonsson", + "Andrew Barto" + ], + "id": "jonsson06a", + "issue": 80, + "pages": [ + 2259, + 2301 + ], + "title": "Causal Graph Based Decomposition of Factored MDPs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/jonsson06a/jonsson06a.pdf b/jonsson06a/jonsson06a.pdf new file mode 100644 index 0000000..a27122d Binary files /dev/null and b/jonsson06a/jonsson06a.pdf differ diff --git a/kaempke06a/info.json b/kaempke06a/info.json new file mode 100644 index 0000000..5e4b56d --- /dev/null +++ b/kaempke06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "Similarity of edge labeled graphs is considered in the sense of minimum \nsquared distance between corresponding values. Vertex correspondences are \nestablished by isomorphisms if both graphs are of equal size and by \nsubisomorphisms if one graph has fewer vertices than the other. Best fit \nisomorphisms and subisomorphisms amount to solutions of quadratic \nassignment problems and are computed exactly as well as approximately\nby minimum cost flow, linear assignment relaxations and related graph \nalgorithms.", + "authors": [ + "Thomas Kämpke" + ], + "id": "kaempke06a", + "issue": 73, + "pages": [ + 2065, + 2086 + ], + "title": "Distance Patterns in Structural Similarity", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/kaempke06a/kaempke06a.pdf b/kaempke06a/kaempke06a.pdf new file mode 100644 index 0000000..6eaa503 Binary files /dev/null and b/kaempke06a/kaempke06a.pdf differ diff --git a/keerthi06a/info.json b/keerthi06a/info.json new file mode 100644 index 0000000..42fa241 --- /dev/null +++ b/keerthi06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "Support vector machines (SVMs), though accurate, are not preferred in\napplications requiring great classification speed, due to the number\nof support vectors being large. To overcome this problem we devise a\nprimal method with the following properties: (1) it decouples the idea\nof basis functions from the concept of support vectors; (2) it\ngreedily finds a set of kernel basis functions of a specified maximum\nsize (dmax) to approximate the SVM primal cost \nfunction well; (3)\nit is efficient and roughly scales as O(ndmax2) where \nn is the\nnumber of training examples; and, (4) the number of basis functions it\nrequires to achieve an accuracy close to the SVM accuracy is usually\nfar less than the number of SVM support vectors.", + "authors": [ + "S. Sathiya Keerthi", + "Olivier Chapelle", + "Dennis DeCoste" + ], + "id": "keerthi06a", + "issue": 54, + "pages": [ + 1493, + 1515 + ], + "title": "Building Support Vector Machines with Reduced Classifier Complexity", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/keerthi06a/keerthi06a.pdf b/keerthi06a/keerthi06a.pdf new file mode 100644 index 0000000..d6a0f7d Binary files /dev/null and b/keerthi06a/keerthi06a.pdf differ diff --git a/kim06a/info.json b/kim06a/info.json new file mode 100644 index 0000000..99153dd --- /dev/null +++ b/kim06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "This paper proposes a general probabilistic framework for\nshape-based modeling and classification of waveform data. A\nsegmental hidden Markov model (HMM) is used to characterize\nwaveform shape and shape variation is captured by adding random\neffects to the segmental model. The resulting probabilistic\nframework provides a basis for learning of waveform models from\ndata as well as parsing and recognition of new waveforms.\nExpectation-maximization (EM) algorithms are derived and\ninvestigated for fitting such models to data. In particular, the\n\"expectation conditional maximization either\" (ECME) algorithm is\nshown to provide significantly faster convergence than a standard\nEM procedure. Experimental results on two real-world data sets\ndemonstrate that the proposed approach leads to improved accuracy\nin classification and segmentation when compared to alternatives\nsuch as Euclidean distance matching, dynamic time warping, and\nsegmental HMMs without random effects.", + "authors": [ + "Seyoung Kim", + "Padhraic Smyth" + ], + "id": "kim06a", + "issue": 32, + "pages": [ + 945, + 969 + ], + "title": "Segmental Hidden Markov Models with Random Effects for Waveform Modeling", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/kim06a/kim06a.pdf b/kim06a/kim06a.pdf new file mode 100644 index 0000000..84e2eab Binary files /dev/null and b/kim06a/kim06a.pdf differ diff --git a/kitzelmann06a/info.json b/kitzelmann06a/info.json new file mode 100644 index 0000000..94676fc --- /dev/null +++ b/kitzelmann06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We describe an approach to the inductive synthesis of recursive\nequations from input/output-examples which is based on the classical\ntwo-step approach to induction of functional Lisp programs of\nSummers (1977). In a first step, I/O-examples are rewritten to\ntraces which explain the outputs given the respective inputs based on\na datatype theory. These traces can be integrated into one conditional\nexpression which represents a non-recursive program. In a second\nstep, this initial program term is generalized into recursive\nequations by searching for syntactical regularities in the term. Our\napproach extends the classical work in several aspects. The most\nimportant extensions are that we are able to induce a set of\nrecursive equations in one synthesizing step, the equations may\ncontain more than one recursive call, and additionally needed\nparameters are automatically introduced.", + "authors": [ + "Emanuel Kitzelmann", + "Ute Schmid" + ], + "id": "kitzelmann06a", + "issue": 14, + "pages": [ + 429, + 454 + ], + "title": "Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/kitzelmann06a/kitzelmann06a.pdf b/kitzelmann06a/kitzelmann06a.pdf new file mode 100644 index 0000000..b5fe193 Binary files /dev/null and b/kitzelmann06a/kitzelmann06a.pdf differ diff --git a/klivans06a/info.json b/klivans06a/info.json new file mode 100644 index 0000000..6a76122 --- /dev/null +++ b/klivans06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nWe consider two well-studied problems regarding attribute\nefficient learning: learning\ndecision lists and learning parity functions.\nFirst, we give an algorithm for learning decision\nlists of length k over n variables using \n2Õ(k1/3) log n examples and time \nnÕ(k1/3). This is the first\nalgorithm for learning decision lists that has both subexponential\nsample complexity and subexponential running time in the relevant\nparameters. Our approach is based on\na new construction of low degree, low weight polynomial threshold\nfunctions for decision lists. For a wide range of parameters our\nconstruction matches a lower bound due to Beigel for \ndecision lists and gives an essentially optimal tradeoff between\npolynomial threshold function degree and weight. \n

\nSecond, we give an\nalgorithm for learning an unknown parity function on k out of n\nvariables using O(n1-1/k) examples in poly(n) time. For\nk=o(log n) this yields the first polynomial time algorithm\nfor learning parity on a superconstant number of variables with\nsublinear sample complexity. We also give a simple algorithm\nfor learning an unknown length-k parity using O(k log n)\nexamples in nk/2 time, which \nimproves on the naive nk time\nbound of exhaustive search.\n

", + "authors": [ + "Adam R. Klivans", + "Rocco A. Servedio" + ], + "id": "klivans06a", + "issue": 19, + "pages": [ + 587, + 602 + ], + "title": "Toward Attribute Efficient Learning of Decision Lists and Parities", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/klivans06a/klivans06a.pdf b/klivans06a/klivans06a.pdf new file mode 100644 index 0000000..02a79f4 Binary files /dev/null and b/klivans06a/klivans06a.pdf differ diff --git a/kok06a/info.json b/kok06a/info.json new file mode 100644 index 0000000..d6b280f --- /dev/null +++ b/kok06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "In this article we describe a set of scalable techniques for\nlearning the behavior of a group of agents in a collaborative\nmultiagent setting. As a basis we use the framework of coordination\ngraphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between\nagents to decompose the global payoff function into a sum of local\nterms. First, we deal with the single-state case and describe a\npayoff propagation algorithm that computes the individual actions\nthat approximately maximize the global payoff function. The method\ncan be viewed as the decision-making analogue of belief propagation\nin Bayesian networks. Second, we focus on learning the behavior of\nthe agents in sequential decision-making tasks. We introduce\ndifferent model-free reinforcement-learning techniques, unitedly\ncalled Sparse Cooperative Q-learning, which approximate the global\naction-value function based on the topology of a coordination graph,\nand perform updates using the contribution of the individual agents\nto the maximal global action value. The combined use of an\nedge-based decomposition of the action-value function and the payoff\npropagation algorithm for efficient action selection, result in an\napproach that scales only linearly in the problem size. We provide\nexperimental evidence that our method outperforms related multiagent\nreinforcement-learning methods based on temporal differences.", + "authors": [ + "Jelle R. Kok", + "Nikos Vlassis" + ], + "id": "kok06a", + "issue": 64, + "pages": [ + 1789, + 1828 + ], + "title": "Collaborative Multiagent Reinforcement Learning by Payoff Propagation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/kok06a/kok06a.pdf b/kok06a/kok06a.pdf new file mode 100644 index 0000000..4e1c17f Binary files /dev/null and b/kok06a/kok06a.pdf differ diff --git a/kolter06a/info.json b/kolter06a/info.json new file mode 100644 index 0000000..3b895f9 --- /dev/null +++ b/kolter06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We describe the use of machine learning and data mining to\ndetect and classify malicious executables as they appear in the wild.\nWe gathered 1,971 benign and 1,651 malicious executables and encoded\neach as a training example using n-grams of byte codes as features.\nSuch processing resulted in more than 255 million distinct n-grams.\nAfter selecting the most relevant n-grams for prediction,\nwe evaluated a variety of inductive methods, including naive Bayes,\ndecision trees, support vector machines, and boosting.\nUltimately, boosted decision trees outperformed other methods\nwith an area under the ROC curve of 0.996.\nResults suggest that our methodology will scale to larger collections\nof executables.\nWe also evaluated how well the methods classified executables based\non the function of their payload, such as opening a backdoor\nand mass-mailing.\nAreas under the ROC curve for detecting payload function\nwere in the neighborhood of 0.9, which were smaller than those for\nthe detection task.\nHowever, we attribute this drop in performance to fewer training\nexamples and to the challenge of obtaining properly labeled examples,\nrather than to a failing of the methodology or to some inherent difficulty\nof the classification task.\nFinally, we applied detectors to 291 malicious executables\ndiscovered after we gathered our original collection,\nand boosted decision trees achieved a true-positive rate of 0.98 for\na desired false-positive rate of 0.05.\nThis result is particularly important, for it suggests that our\nmethodology could be used as the basis for an operational system\nfor detecting previously undiscovered malicious executables.", + "authors": [ + "J. Zico Kolter", + "Marcus A. Maloof" + ], + "id": "kolter06a", + "issue": 98, + "pages": [ + 2721, + 2744 + ], + "title": "Learning to Detect and Classify Malicious Executables in the Wild", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/kolter06a/kolter06a.pdf b/kolter06a/kolter06a.pdf new file mode 100644 index 0000000..3221096 Binary files /dev/null and b/kolter06a/kolter06a.pdf differ diff --git a/langley06a/info.json b/langley06a/info.json new file mode 100644 index 0000000..0f67640 --- /dev/null +++ b/langley06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "In this paper, we propose a new representation for physical control\n-- teleoreactive logic programs -- along with an interpreter that\nuses them to achieve goals. In addition, we present a new learning\nmethod that acquires recursive forms of these structures from traces\nof successful problem solving. We report experiments in three different\ndomains that demonstrate the generality of this approach. In closing,\nwe review related work on learning complex skills and discuss directions\nfor future research on this topic.", + "authors": [ + "Pat Langley", + "Dongkyu Choi" + ], + "id": "langley06a", + "issue": 16, + "pages": [ + 493, + 518 + ], + "title": "Learning Recursive Control Programs from Problem Solving", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/langley06a/langley06a.pdf b/langley06a/langley06a.pdf new file mode 100644 index 0000000..baafcf7 Binary files /dev/null and b/langley06a/langley06a.pdf differ diff --git a/laskov06a/info.json b/laskov06a/info.json new file mode 100644 index 0000000..b5da6f2 --- /dev/null +++ b/laskov06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "Incremental Support Vector Machines (SVM) are instrumental in\npractical applications of online learning. This work focuses on the\ndesign and analysis of efficient incremental SVM learning, with the\naim of providing a fast, numerically stable and robust\nimplementation. A detailed analysis of convergence and of\nalgorithmic complexity of incremental SVM learning is carried out.\nBased on this analysis, a new design of storage and numerical\noperations is proposed, which speeds up the training of an\nincremental SVM by a factor of 5 to 20. The performance of the new\nalgorithm is demonstrated in two scenarios: learning with limited\nresources and active learning. Various applications of the\nalgorithm, such as in drug discovery, online monitoring of\nindustrial devices and and surveillance of network traffic, can be\nforeseen.", + "authors": [ + "Pavel Laskov", + "Christian Gehl", + "Stefan Kr{{\\\"u}}ger", + "Klaus-Robert M{{\\\"u}}ller" + ], + "id": "laskov06a", + "issue": 68, + "pages": [ + 1909, + 1936 + ], + "title": "Incremental Support Vector Learning: Analysis, Implementation and Applications", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/laskov06a/laskov06a.pdf b/laskov06a/laskov06a.pdf new file mode 100644 index 0000000..1f7d987 Binary files /dev/null and b/laskov06a/laskov06a.pdf differ diff --git a/lecue06a/info.json b/lecue06a/info.json new file mode 100644 index 0000000..907a8fa --- /dev/null +++ b/lecue06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "In this paper we prove the optimality of an aggregation\nprocedure. We prove lower bounds for aggregation of model\nselection type of M density estimators for the Kullback-Leibler\ndivergence (KL), the Hellinger's distance and the L1-distance.\nThe lower bound, with respect to the KL distance, can be achieved\nby the on-line type estimate suggested, among others, by\nYang (2000a). Combining these results, we state that log\nM/n is an optimal rate of aggregation in the sense of\nTsybakov (2003), where n is the sample size.", + "authors": [ + "Guillaume Lecu{{\\'e}}" + ], + "id": "lecue06a", + "issue": 33, + "pages": [ + 971, + 981 + ], + "title": "Lower Bounds and Aggregation in Density Estimation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/lecue06a/lecue06a.pdf b/lecue06a/lecue06a.pdf new file mode 100644 index 0000000..1a17e6e Binary files /dev/null and b/lecue06a/lecue06a.pdf differ diff --git a/lippert06a/info.json b/lippert06a/info.json new file mode 100644 index 0000000..af30059 --- /dev/null +++ b/lippert06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We consider the problem of Tikhonov regularization with a general\nconvex loss function: this formalism includes\nsupport vector machines and regularized least squares. For a family of\nkernels that includes the Gaussian, parameterized by a \"bandwidth\"\nparameter σ, we characterize the limiting solution as σ\n→ ∞. In particular, we show that if we\nset the regularization parameter λ = ~λ σ-2p, the regularization term of the Tikhonov\nproblem tends to an indicator function on polynomials of degree\n⌊p⌋ (with residual regularization in the case where p\n∈ Z). The proof rests on two key ideas: epi-convergence, a\nnotion of functional convergence under which limits of minimizers\nconverge to minimizers of limits, and a value-based formulation\nof learning, where we work with regularization on the function output\nvalues (y) as opposed to the function expansion coefficients in the\nRKHS. Our result generalizes and unifies previous results in this\narea.", + "authors": [ + "Ross A. Lippert", + "Ryan M. Rifkin" + ], + "id": "lippert06a", + "issue": 29, + "pages": [ + 855, + 876 + ], + "title": "Infinite-\u00cf\u0083 Limits For Tikhonov Regularization", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/lippert06a/lippert06a.pdf b/lippert06a/lippert06a.pdf new file mode 100644 index 0000000..e8f41bd Binary files /dev/null and b/lippert06a/lippert06a.pdf differ diff --git a/liu06a/info.json b/liu06a/info.json new file mode 100644 index 0000000..95a63cb --- /dev/null +++ b/liu06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "This paper is about non-approximate acceleration of high-dimensional\nnonparametric operations such as k nearest neighbor classifiers. We\nattempt to exploit the fact that even if we want exact answers to\nnonparametric queries, we usually do not need to explicitly find the\ndata points close to the query, but merely need to answer questions\nabout the properties of that set of data points. This offers a small\namount of computational leeway, and we investigate how much that\nleeway can be exploited. This is applicable to many algorithms in\nnonparametric statistics, memory-based learning and kernel-based\nlearning. But for clarity, this paper concentrates on pure k-NN\nclassification. We introduce new ball-tree algorithms that on\nreal-world data sets give accelerations from 2-fold to 100-fold\ncompared against highly optimized traditional ball-tree-based\nk-NN. These results include data sets with up to 106 \ndimensions and 105 records, and demonstrate non-trivial \nspeed-ups while giving exact answers.", + "authors": [ + "Ting Liu", + "Andrew W. Moore", + "Alexander Gray" + ], + "id": "liu06a", + "issue": 40, + "pages": [ + 1135, + 1158 + ], + "title": "New Algorithms for Efficient High-Dimensional Nonparametric Classification", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/liu06a/liu06a.pdf b/liu06a/liu06a.pdf new file mode 100644 index 0000000..016df0f Binary files /dev/null and b/liu06a/liu06a.pdf differ diff --git a/malioutov06a/info.json b/malioutov06a/info.json new file mode 100644 index 0000000..a52c402 --- /dev/null +++ b/malioutov06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We present a new framework based on walks in a graph for analysis and\ninference in Gaussian graphical models. The key idea is to decompose\nthe correlation between each pair of variables as a sum over all walks\nbetween those variables in the graph. The weight of each walk is given\nby a product of edgewise partial correlation coefficients. This\nrepresentation holds for a large class of Gaussian graphical models\nwhich we call walk-summable. We give a precise characterization of\nthis class of models, and relate it to other classes including\ndiagonally dominant, attractive, non-frustrated, and\npairwise-normalizable. We provide a walk-sum interpretation of\nGaussian belief propagation in trees and of the approximate method of\nloopy belief propagation in graphs with cycles. The walk-sum\nperspective leads to a better understanding of Gaussian belief\npropagation and to stronger results for its convergence in loopy\ngraphs.", + "authors": [ + "Dmitry M. Malioutov", + "Jason K. Johnson", + "Alan S. Willsky" + ], + "id": "malioutov06a", + "issue": 72, + "pages": [ + 2031, + 2064 + ], + "title": "Walk-Sums and Belief Propagation in Gaussian Graphical Models", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/malioutov06a/malioutov06a.pdf b/malioutov06a/malioutov06a.pdf new file mode 100644 index 0000000..9bade33 Binary files /dev/null and b/malioutov06a/malioutov06a.pdf differ diff --git a/mangasarian06a/info.json b/mangasarian06a/info.json new file mode 100644 index 0000000..4ecde5e --- /dev/null +++ b/mangasarian06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "Support vector machines utilizing the 1-norm, typically\nset up as linear programs (Mangasarian, 2000; Bradley and \nMangasarian, 1998), are formulated here\nas a completely unconstrained minimization of a convex differentiable \npiecewise-quadratic objective function in the dual space. The objective function,\nwhich has a Lipschitz continuous gradient and contains only one\nadditional finite parameter, can be minimized by a generalized\nNewton method and leads to an exact solution of the support vector\nmachine problem. The approach here is based on a formulation\nof a very general linear program as an unconstrained minimization\nproblem and its application to support vector machine classification\nproblems. The present approach which generalizes both\n(Mangasarian, 2004) and (Fung and Mangasarian, 2004) is also applied to nonlinear\napproximation where a minimal number of nonlinear kernel functions\nare utilized to approximate a function from a given number\nof function values.", + "authors": [ + "Olvi L. Mangasarian" + ], + "id": "mangasarian06a", + "issue": 55, + "pages": [ + 1517, + 1530 + ], + "title": "Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/mangasarian06a/mangasarian06a.pdf b/mangasarian06a/mangasarian06a.pdf new file mode 100644 index 0000000..ace28a7 Binary files /dev/null and b/mangasarian06a/mangasarian06a.pdf differ diff --git a/maurer06a/info.json b/maurer06a/info.json new file mode 100644 index 0000000..ef74d8c --- /dev/null +++ b/maurer06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "We give dimension-free and data-dependent bounds for linear multi-task\nlearning where a common linear operator is chosen to preprocess data for a\nvector of task specific linear-thresholding classifiers. The complexity\npenalty of multi-task learning is bounded by a simple expression involving\nthe margins of the task-specific classifiers, the Hilbert-Schmidt norm of\nthe selected preprocessor and the Hilbert-Schmidt norm of the covariance\noperator for the total mixture of all task distributions, or, alternatively,\nthe Frobenius norm of the total Gramian matrix for the data-dependent\nversion. The results can be compared to state-of-the-art results on linear\nsingle-task learning.", + "authors": [ + "Andreas Maurer" + ], + "id": "maurer06a", + "issue": 4, + "pages": [ + 117, + 139 + ], + "title": "Bounds for Linear Multi-Task Learning", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/maurer06a/maurer06a.pdf b/maurer06a/maurer06a.pdf new file mode 100644 index 0000000..e398a70 Binary files /dev/null and b/maurer06a/maurer06a.pdf differ diff --git a/meinshausen06a/info.json b/meinshausen06a/info.json new file mode 100644 index 0000000..408530e --- /dev/null +++ b/meinshausen06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "Random forests were introduced as a machine learning tool \nin Breiman (2001) and have\nsince proven to be very popular and powerful for high-dimensional \nregression and classification. \nFor regression, random forests give an accurate approximation of the\nconditional mean of a response variable. \nIt is shown here that random forests provide information about\nthe full conditional distribution of the response variable, not only\nabout the conditional mean. Conditional quantiles can be inferred with\nquantile regression forests, a generalisation of random forests.\nQuantile regression forests give a non-parametric and accurate\nway of estimating conditional quantiles for high-dimensional predictor\nvariables. \nThe algorithm is shown to be consistent. Numerical examples suggest that\nthe algorithm is competitive in terms of predictive power.", + "authors": [ + "Nicolai Meinshausen" + ], + "id": "meinshausen06a", + "issue": 34, + "pages": [ + 983, + 999 + ], + "title": "Quantile Regression Forests", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/meinshausen06a/meinshausen06a.pdf b/meinshausen06a/meinshausen06a.pdf new file mode 100644 index 0000000..7a8ab10 Binary files /dev/null and b/meinshausen06a/meinshausen06a.pdf differ diff --git a/micchelli06a/info.json b/micchelli06a/info.json new file mode 100644 index 0000000..68bd254 --- /dev/null +++ b/micchelli06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "In this paper we investigate conditions on the features of a\ncontinuous kernel so that it may approximate an arbitrary continuous\ntarget function uniformly on any compact subset of the input space.\nA number of concrete examples are given of kernels with this\nuniversal approximating property.", + "authors": [ + "Charles A. Micchelli", + "Yuesheng Xu", + "Haizhang Zhang" + ], + "id": "micchelli06a", + "issue": 94, + "pages": [ + 2651, + 2667 + ], + "title": "Universal Kernels", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/micchelli06a/micchelli06a.pdf b/micchelli06a/micchelli06a.pdf new file mode 100644 index 0000000..50ab2a7 Binary files /dev/null and b/micchelli06a/micchelli06a.pdf differ diff --git a/moser06a/info.json b/moser06a/info.json new file mode 100644 index 0000000..f2a0255 --- /dev/null +++ b/moser06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "Kernels are two-placed functions that can be interpreted as inner\nproducts in some Hilbert space. It is this property which makes\nkernels predestinated to carry linear models of learning,\noptimization or classification strategies over to non-linear variants.\nFollowing this idea, various kernel-based methods like support vector\nmachines or kernel principal component analysis have been conceived\nwhich prove to be successful for machine learning, data mining and\ncomputer vision applications. When applying a kernel-based method a\ncentral question is the choice and the design of the kernel function.\nThis paper provides a novel view on kernels based on fuzzy-logical\nconcepts which allows to incorporate prior knowledge in the design\nprocess. It is demonstrated that kernels mapping to the unit interval\nwith constant one in its diagonal can be represented by a commonly\nused fuzzy-logical formula for representing fuzzy rule bases. This\nmeans that a great class of kernels can be represented by\nfuzzy-logical concepts. Apart from this result, which only guarantees\nthe existence of such a representation, constructive examples are\npresented and the relation to unlabeled learning is pointed out.", + "authors": [ + "Bernhard Moser" + ], + "id": "moser06a", + "issue": 92, + "pages": [ + 2603, + 2620 + ], + "title": "On Representing and Generating Kernels by Fuzzy Equivalence Relations", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/moser06a/moser06a.pdf b/moser06a/moser06a.pdf new file mode 100644 index 0000000..d6053b5 Binary files /dev/null and b/moser06a/moser06a.pdf differ diff --git a/mukherjee06a/info.json b/mukherjee06a/info.json new file mode 100644 index 0000000..168c479 --- /dev/null +++ b/mukherjee06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We introduce an algorithm that learns gradients from samples in\nthe supervised learning framework. An error analysis is given for\nthe convergence of the gradient estimated by the algorithm to the\ntrue gradient. The utility of the algorithm for the problem of\nvariable selection as well as determining variable covariance is\nillustrated on simulated data as well as two gene expression\ndata sets. For square loss we provide a very efficient\nimplementation with respect to both memory and time.", + "authors": [ + "Sayan Mukherjee", + "Ding-Xuan Zhou" + ], + "id": "mukherjee06a", + "issue": 17, + "pages": [ + 519, + 549 + ], + "title": "Learning Coordinate Covariances via Gradients", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/mukherjee06a/mukherjee06a.pdf b/mukherjee06a/mukherjee06a.pdf new file mode 100644 index 0000000..f7184d5 Binary files /dev/null and b/mukherjee06a/mukherjee06a.pdf differ diff --git a/mukherjee06b/info.json b/mukherjee06b/info.json new file mode 100644 index 0000000..9bceb05 --- /dev/null +++ b/mukherjee06b/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We introduce an algorithm that simultaneously estimates a\nclassification function as well as its gradient in\nthe supervised learning framework. The motivation for the\nalgorithm is to find salient variables and estimate\nhow they covary. An efficient implementation with respect\nto both memory and time is given. The utility of the\nalgorithm is illustrated on simulated data as well as a gene\nexpression data set. An error analysis is given for\nthe convergence of the estimate of the classification function\nand its gradient to the true classification function and\ntrue gradient.", + "authors": [ + "Sayan Mukherjee", + "Qiang Wu" + ], + "id": "mukherjee06b", + "issue": 87, + "pages": [ + 2481, + 2514 + ], + "title": "Estimation of Gradients and Coordinate Covariation in Classification", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/mukherjee06b/mukherjee06b.pdf b/mukherjee06b/mukherjee06b.pdf new file mode 100644 index 0000000..46bca12 Binary files /dev/null and b/mukherjee06b/mukherjee06b.pdf differ diff --git a/munos06a/info.json b/munos06a/info.json new file mode 100644 index 0000000..4603df5 --- /dev/null +++ b/munos06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "

\nWe study a variance reduction technique for Monte Carlo estimation\nof functionals in Markov chains. The method is based on designing\nsequential control variates using successive approximations\nof the function of interest V. Regular Monte Carlo estimates have\na variance of O(1/N), where N is the number of sample trajectories\nof the Markov chain. Here, we obtain a geometric variance reduction\nO(ρN) (with ρ<1) up to a threshold that depends on\nthe approximation error V-AV, where A is an approximation\noperator linear in the values. Thus, if V belongs to the right\napproximation space (i.e. AV=V), the variance decreases geometrically\nto zero.\n

\nAn immediate application is value function estimation in Markov chains,\nwhich may be used for policy evaluation in a policy iteration algorithm\nfor solving Markov Decision Processes. \n

\nAnother important domain, for which variance reduction is highly needed,\nis gradient estimation, that is computing the sensitivity αV\nof the performance measure V with respect to some parameter α\nof the transition probabilities. For example, in policy parametric\noptimization, computing an estimate of the policy gradient is required\nto perform a gradient optimization method.\n

\nWe show that, using two approximations for the value function\nand the gradient, a geometric variance reduction is also achieved,\nup to a threshold that depends on the approximation errors of both\nof those representations.\n

", + "authors": [ + "R{{\\'e}}mi Munos" + ], + "id": "munos06a", + "issue": 13, + "pages": [ + 413, + 427 + ], + "title": "Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/munos06a/munos06a.pdf b/munos06a/munos06a.pdf new file mode 100644 index 0000000..9c67e44 Binary files /dev/null and b/munos06a/munos06a.pdf differ diff --git a/munos06b/info.json b/munos06b/info.json new file mode 100644 index 0000000..4bccc6a --- /dev/null +++ b/munos06b/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "

\nPolicy search is a method for approximately solving an optimal\ncontrol problem by performing a parametric optimization search in\na given class of parameterized policies. In order to process a local\noptimization technique, such as a gradient method, we wish to evaluate\nthe sensitivity of the performance measure with respect to the policy\nparameters, the so-called policy gradient. This paper is concerned\nwith the estimation of the policy gradient for continuous-time, deterministic\nstate dynamics, in a reinforcement learning framework, that\nis, when the decision maker does not have a model of the state\ndynamics.\n

\n

\nWe show that usual likelihood ratio methods used in discrete-time,\nfail to proceed the gradient because they are subject to variance\nexplosion when the discretization time-step decreases to 0. We\ndescribe an alternative approach based on the approximation of the\npathwise derivative, which leads to a policy gradient estimate that\nconverges almost surely to the true gradient when the time-step tends\nto 0. The underlying idea starts with the derivation of an explicit\nrepresentation of the policy gradient using pathwise derivation. This\nderivation makes use of the knowledge of the state dynamics. Then,\nin order to estimate the gradient from the observable data only, we\nuse a stochastic policy to discretize the continuous deterministic\nsystem into a stochastic discrete process, which enables to replace\nthe unknown coefficients by quantities that solely depend on known\ndata. We prove the almost sure convergence of this estimate to the\ntrue policy gradient when the discretization time-step goes to zero. \n

\n

\nThe method is illustrated on two target problems, in discrete\nand continuous control spaces.\n

", + "authors": [ + "R{{\\'e}}mi Munos" + ], + "id": "munos06b", + "issue": 26, + "pages": [ + 771, + 791 + ], + "title": "Policy Gradient in Continuous Time", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/munos06b/munos06b.pdf b/munos06b/munos06b.pdf new file mode 100644 index 0000000..559c9a6 Binary files /dev/null and b/munos06b/munos06b.pdf differ diff --git a/niculescu06a/info.json b/niculescu06a/info.json new file mode 100644 index 0000000..b2ca073 --- /dev/null +++ b/niculescu06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "

\nThe task of learning models for many real-world problems requires\nincorporating domain knowledge into learning algorithms, to enable\naccurate learning from a realistic volume of training data. This paper\nconsiders a variety of types of domain knowledge for constraining\nparameter estimates when learning Bayesian networks. In particular, we\nconsider domain knowledge that constrains the values or relationships\namong subsets of parameters in a Bayesian network with known\nstructure.\n

\nWe incorporate a wide variety of parameter constraints into learning\nprocedures for Bayesian networks, by formulating this task as a\nconstrained optimization problem. The assumptions made in module\nnetworks, dynamic Bayes nets and context specific independence models\ncan be viewed as particular cases of such parameter constraints. We\npresent closed form solutions or fast iterative algorithms for\nestimating parameters subject to several specific classes of parameter\nconstraints, including equalities and inequalities among parameters,\nconstraints on individual parameters, and constraints on sums and\nratios of parameters, for discrete and continuous variables. Our\nmethods cover learning from both frequentist and Bayesian points of\nview, from both complete and incomplete data.\n

\nWe present formal guarantees for our estimators, as well as methods\nfor automatically learning useful parameter constraints from data. To\nvalidate our approach, we apply it to the domain of fMRI brain image\nanalysis. Here we demonstrate the ability of our system to first learn\nuseful relationships among parameters, and then to use them to\nconstrain the training of the Bayesian network, resulting in improved\ncross-validated accuracy of the learned model. Experiments on\nsynthetic data are also presented.\n

", + "authors": [ + "Radu Stefan Niculescu", + "Tom M. Mitchell", + "R. Bharat Rao" + ], + "id": "niculescu06a", + "issue": 49, + "pages": [ + 1357, + 1383 + ], + "title": "Bayesian Network Learning with Parameter Constraints", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/niculescu06a/niculescu06a.pdf b/niculescu06a/niculescu06a.pdf new file mode 100644 index 0000000..948e561 Binary files /dev/null and b/niculescu06a/niculescu06a.pdf differ diff --git a/olsson06a/info.json b/olsson06a/info.json new file mode 100644 index 0000000..893ed86 --- /dev/null +++ b/olsson06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We apply a type of generative modelling to the problem of blind source\nseparation in which prior knowledge about the latent source signals,\nsuch as time-varying auto-correlation and quasi-periodicity, are\nincorporated into a linear state-space model. In simulations, we show\nthat in terms of signal-to-error ratio, the sources are inferred more\naccurately as a result of the inclusion of strong prior knowledge. We\nexplore different schemes of maximum-likelihood optimization for the\npurpose of learning the model parameters. The Expectation Maximization\nalgorithm, which is often considered the standard optimization method\nin this context, results in slow convergence when the noise variance\nis small. In such scenarios, quasi-Newton optimization yields\nsubstantial improvements in a range of signal to noise ratios. We\nanalyze the performance of the methods on convolutive mixtures of\nspeech signals.", + "authors": [ + "Rasmus Kongsgaard Olsson", + "Lars Kai Hansen" + ], + "id": "olsson06a", + "issue": 91, + "pages": [ + 2585, + 2602 + ], + "title": "Linear State-Space Models for Blind Source Separation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/olsson06a/olsson06a.pdf b/olsson06a/olsson06a.pdf new file mode 100644 index 0000000..3825a4b Binary files /dev/null and b/olsson06a/olsson06a.pdf differ diff --git a/passerini06a/info.json b/passerini06a/info.json new file mode 100644 index 0000000..ddbbd9e --- /dev/null +++ b/passerini06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We develop kernels for measuring the similarity between relational\ninstances using background knowledge expressed in first-order logic.\nThe method allows us to bridge the gap between traditional inductive\nlogic programming (ILP) representations and statistical approaches\nto supervised learning. Logic programs are first used to generate\nproofs of given visitor programs that use predicates declared in the\navailable background knowledge. A kernel is then defined over pairs\nof proof trees. The method can be used for supervised learning tasks\nand is suitable for classification as well as regression. We report\npositive empirical results on Bongard-like and M-of-N problems\nthat are difficult or impossible to solve with traditional ILP\ntechniques, as well as on real bioinformatics and chemoinformatics\ndata sets.", + "authors": [ + "Andrea Passerini", + "Paolo Frasconi", + "Luc De Raedt" + ], + "id": "passerini06a", + "issue": 10, + "pages": [ + 307, + 342 + ], + "title": "Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/passerini06a/passerini06a.pdf b/passerini06a/passerini06a.pdf new file mode 100644 index 0000000..0be6701 Binary files /dev/null and b/passerini06a/passerini06a.pdf differ diff --git a/peer06a/info.json b/peer06a/info.json new file mode 100644 index 0000000..0eb4abf --- /dev/null +++ b/peer06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "In recent years, there has been a growing interest in applying Bayesian\nnetworks and their extensions to reconstruct regulatory networks from\ngene expression data. Since the gene expression domain involves a large\nnumber of variables and a limited number of samples, it poses both\ncomputational and statistical challenges to Bayesian network learning\nalgorithms. Here we define a constrained family of Bayesian network\nstructures suitable for this domain and devise an efficient search algorithm\nthat utilizes these structural constraints to find high scoring networks\nfrom data. Interestingly, under reasonable assumptions on the underlying\nprobability distribution, we can provide performance guarantees on our\nalgorithm. Evaluation on real data from yeast and mouse, demonstrates that\nour method cannot only reconstruct a high quality model of the yeast\nregulatory network, but is also the first method to scale to the complexity\nof mammalian networks and successfully reconstructs a reasonable model over\nthousands of variables.", + "authors": [ + "Dana Pe'er", + "Amos Tanay", + "Aviv Regev" + ], + "id": "peer06a", + "issue": 6, + "pages": [ + 167, + 189 + ], + "title": "MinReg: A Scalable Algorithm for Learning Parsimonious Regulatory Networks in Yeast and Mammals", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/peer06a/peer06a.pdf b/peer06a/peer06a.pdf new file mode 100644 index 0000000..49d038d Binary files /dev/null and b/peer06a/peer06a.pdf differ diff --git a/porta06a/info.json b/porta06a/info.json new file mode 100644 index 0000000..745664a --- /dev/null +++ b/porta06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "We propose a novel approach to optimize Partially Observable Markov\nDecisions Processes (POMDPs) defined on continuous spaces. To date,\nmost algorithms for model-based POMDPs are restricted to discrete\nstates, actions, and observations, but many real-world problems such\nas, for instance, robot navigation, are naturally defined on\ncontinuous spaces. In this work, we demonstrate that the value\nfunction for continuous POMDPs is convex in the beliefs over\ncontinuous state spaces, and piecewise-linear convex for the\nparticular case of discrete observations and actions but still\ncontinuous states. We also demonstrate that continuous Bellman backups\nare contracting and isotonic ensuring the monotonic convergence of\nvalue-iteration algorithms. Relying on those properties, we extend the\nalgorithm, originally developed for discrete POMDPs, to work in\ncontinuous state spaces by representing the observation, transition,\nand reward models using Gaussian mixtures, and the beliefs using\nGaussian mixtures or particle sets. With these representations, the\nintegrals that appear in the Bellman backup can be computed in closed\nform and, therefore, the algorithm is computationally\nfeasible. Finally, we further extend to deal with continuous action\nand observation sets by designing effective sampling approaches.", + "authors": [ + "Josep M. Porta", + "Nikos Vlassis", + "Matthijs T.J. Spaan", + "Pascal Poupart" + ], + "id": "porta06a", + "issue": 82, + "pages": [ + 2329, + 2367 + ], + "title": "Point-Based Value Iteration for Continuous POMDPs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/porta06a/porta06a.pdf b/porta06a/porta06a.pdf new file mode 100644 index 0000000..27fdafc Binary files /dev/null and b/porta06a/porta06a.pdf differ diff --git a/raghavan06a/info.json b/raghavan06a/info.json new file mode 100644 index 0000000..45f4f6d --- /dev/null +++ b/raghavan06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We extend the traditional active learning framework to include\nfeedback on features in addition to labeling instances, and we execute\na careful study of the effects of feature selection and human feedback\non features in the setting of text categorization. Our experiments on\na variety of categorization tasks indicate that there is significant\npotential in improving classifier performance by feature re-weighting,\nbeyond that achieved via membership queries alone (traditional active\nlearning) if we have access to an oracle that can point to the\nimportant (most predictive) features. Our experiments on human\nsubjects indicate that human feedback on feature relevance can\nidentify a sufficient proportion of the most relevant features (over\n50% in our experiments). We find that on average, labeling a feature\ntakes much less time than labeling a document. We devise an algorithm\nthat interleaves labeling features and documents which significantly\naccelerates standard active learning in our simulation experiments.\nFeature feedback can complement traditional active learning in\napplications such as news filtering, e-mail classification, and\npersonalization, where the human teacher can have significant\nknowledge on the relevance of features.", + "authors": [ + "Hema Raghavan", + "Omid Madani", + "Rosie Jones" + ], + "id": "raghavan06a", + "issue": 60, + "pages": [ + 1655, + 1686 + ], + "title": "Active Learning with Feedback on Features and Instances", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/raghavan06a/raghavan06a.pdf b/raghavan06a/raghavan06a.pdf new file mode 100644 index 0000000..c5bb5eb Binary files /dev/null and b/raghavan06a/raghavan06a.pdf differ diff --git a/ross06a/info.json b/ross06a/info.json new file mode 100644 index 0000000..375c9d6 --- /dev/null +++ b/ross06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Many perceptual models and theories hinge on treating objects as a\ncollection of constituent parts. When applying these approaches to\ndata, a fundamental problem arises: how can we determine what are the\nparts?\nWe attack this problem using learning, proposing a form of generative \nlatent factor model, in which each\ndata dimension is allowed to select a different factor or part as its\nexplanation. This approach permits a range of variations that\nposit different models for the appearance of a part. \nHere we provide the details for two such models: a\ndiscrete and a continuous one.\n Further, we show that this latent factor model can be extended\nhierarchically to account for correlations between the appearances of\ndifferent parts. This permits modelling of data consisting of\nmultiple categories, and learning these categories simultaneously\nwith the parts when they are unobserved. Experiments demonstrate the\nability to learn parts-based representations, and categories, of\nfacial images and user-preference data.", + "authors": [ + "David A. Ross", + "Richard S. Zemel" + ], + "id": "ross06a", + "issue": 83, + "pages": [ + 2369, + 2397 + ], + "title": "Learning Parts-Based Representations of Data", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/ross06a/ross06a.pdf b/ross06a/ross06a.pdf new file mode 100644 index 0000000..9dee28e Binary files /dev/null and b/ross06a/ross06a.pdf differ diff --git a/rousu06a/info.json b/rousu06a/info.json new file mode 100644 index 0000000..57d8771 --- /dev/null +++ b/rousu06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "

\nWe present a kernel-based algorithm for hierarchical text\nclassification where the documents are allowed to belong to more\nthan one category at a time. The classification model is a variant\nof the Maximum Margin Markov Network framework, where the\nclassification hierarchy is represented as a Markov tree equipped\nwith an exponential family defined on the edges.\nWe present an efficient optimization\nalgorithm based on incremental conditional gradient ascent in\nsingle-example subspaces spanned by the marginal dual variables.\nThe optimization is facilitated with a dynamic programming based\nalgorithm that computes best update directions in the feasible set.\n

\nExperiments show that the algorithm can feasibly optimize training\nsets of thousands of examples and classification hierarchies\nconsisting of hundreds of nodes. Training of the full hierarchical\nmodel is as efficient as training independent SVM-light classifiers\nfor each node. The algorithm's predictive accuracy was found to be\ncompetitive with other recently introduced hierarchical multi-category\nor multilabel classification learning algorithms.\n

", + "authors": [ + "Juho Rousu", + "Craig Saunders", + "Sandor Szedmak", + "John Shawe-Taylor" + ], + "id": "rousu06a", + "issue": 58, + "pages": [ + 1601, + 1626 + ], + "title": "Kernel-Based Learning of Hierarchical Multilabel Classification Models", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/rousu06a/rousu06a.pdf b/rousu06a/rousu06a.pdf new file mode 100644 index 0000000..732a7e8 Binary files /dev/null and b/rousu06a/rousu06a.pdf differ diff --git a/roverato06a/info.json b/roverato06a/info.json new file mode 100644 index 0000000..aa8690e --- /dev/null +++ b/roverato06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "This paper deals with chain graph models under alternative AMP\ninterpretation. A new representative of an AMP Markov equivalence\nclass, called the largest deflagged graph, is proposed.\nThe representative is based on revealed internal structure of the\nAMP Markov equivalence class. More specifically, the AMP Markov\nequivalence class decomposes into finer strong equivalence\nclasses and there exists a distinguished strong equivalence class\namong those forming the AMP Markov equivalence class. The largest\ndeflagged graph is the largest chain graph in that distinguished\nstrong equivalence class. A composed graphical procedure to get\nthe largest deflagged graph on the basis of any AMP Markov equivalent\nchain graph is presented. In general, the largest deflagged graph\ndiffers from the AMP essential graph, which is another\nrepresentative of the AMP Markov equivalence class.", + "authors": [ + "Alberto Roverato", + "Milan Studený" + ], + "id": "roverato06a", + "issue": 37, + "pages": [ + 1045, + 1078 + ], + "title": "A Graphical Representation of Equivalence Classes of AMP Chain Graphs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/roverato06a/roverato06a.pdf b/roverato06a/roverato06a.pdf new file mode 100644 index 0000000..4d6ac78 Binary files /dev/null and b/roverato06a/roverato06a.pdf differ diff --git a/ryabko06a/info.json b/ryabko06a/info.json new file mode 100644 index 0000000..c3e17f0 --- /dev/null +++ b/ryabko06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "

\nIn this work we consider the task of relaxing the i.i.d. assumption\nin pattern recognition (or classification), aiming to make\nexisting learning algorithms applicable to a wider range of tasks.\nPattern recognition is guessing a discrete label of\nsome object based on a set of given examples (pairs of\nobjects and labels). We consider the case \nof deterministically defined labels. \nTraditionally, this\ntask is studied under the assumption that examples\nare independent and identically distributed. However,\nit turns out that many results of pattern recognition\n theory carry over \na weaker assumption. Namely, under the assumption\nof conditional independence and identical distribution of objects,\nwhile the only assumption on the distribution of labels is that the\nrate of occurrence of each label should be above some positive threshold.\n

\nWe find a broad class of learning algorithms for which estimations of\nthe probability of the classification error achieved under the \nclassical i.i.d. assumption can\nbe generalized to the similar estimates for case of \nconditionally i.i.d. examples.\n

", + "authors": [ + "Daniil Ryabko" + ], + "id": "ryabko06a", + "issue": 22, + "pages": [ + 645, + 664 + ], + "title": "Pattern Recognition for Conditionally Independent Data", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/ryabko06a/ryabko06a.pdf b/ryabko06a/ryabko06a.pdf new file mode 100644 index 0000000..feb8bfd Binary files /dev/null and b/ryabko06a/ryabko06a.pdf differ diff --git a/sahbi06a/info.json b/sahbi06a/info.json new file mode 100644 index 0000000..e156275 --- /dev/null +++ b/sahbi06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nWe introduce a computational design for pattern detection\nbased on a tree-structured network of support vector machines (SVMs).\nAn SVM is associated with each cell in a recursive partitioning of the\nspace of patterns (hypotheses) into increasingly finer subsets. The\nhierarchy is traversed coarse-to-fine and each chain of positive\nresponses from the root to a leaf constitutes a detection. Our\nobjective is to design and build a network which balances overall\nerror and computation.\n

\nInitially, SVMs are constructed for each cell with no\nconstraints. This \"free network\" is then perturbed, cell by cell,\ninto another network, which is \"graded\" in two ways: first, the\nnumber of support vectors of each SVM is reduced (by clustering) in\norder to adjust to a pre-determined, increasing function of cell\ndepth; second, the decision boundaries are shifted to preserve all\npositive responses from the original set of training data. The limits\non the numbers of clusters (virtual support vectors) result from\nminimizing the mean computational cost of collecting all detections\nsubject to a bound on the expected number of false positives.\n

\nWhen applied to detecting faces in cluttered scenes, the\npatterns correspond to poses and the free network is already faster\nand more accurate than applying a single pose-specific SVM many times.\nThe graded network promotes very rapid processing of background\nregions while maintaining the discriminatory power of the free\nnetwork.\n

", + "authors": [ + "Hichem Sahbi", + "Donald Geman" + ], + "id": "sahbi06a", + "issue": 74, + "pages": [ + 2087, + 2123 + ], + "title": "A Hierarchy of Support Vector Machines for Pattern Detection", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/sahbi06a/sahbi06a.pdf b/sahbi06a/sahbi06a.pdf new file mode 100644 index 0000000..80e2e75 Binary files /dev/null and b/sahbi06a/sahbi06a.pdf differ diff --git a/scheinberg06a/info.json b/scheinberg06a/info.json new file mode 100644 index 0000000..8d8d493 --- /dev/null +++ b/scheinberg06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "We propose an active set algorithm to solve the convex \nquadratic programming (QP) problem which is the core of \nthe support vector machine (SVM) training. \nThe underlying method is not new and is based on the \nextensive practice of the Simplex method and its variants\nfor convex quadratic problems. However, its application\nto large-scale SVM problems is new. Until recently the \ntraditional active set methods were considered impractical for\nlarge SVM problems. By adapting the methods to the special \nstructure of SVM problems we were able to produce an efficient \nimplementation. We conduct an extensive study of the behavior \nof our method and its variations on SVM problems.\n We present computational results comparing our method with\nJoachims' SVMlight (see Joachims, 1999). \nThe results show that our method has overall\nbetter performance on many SVM problems. It seems to have \na particularly strong advantage on more difficult problems. \nIn addition this algorithm has better theoretical properties \nand it naturally extends to the incremental mode. Since \nthe proposed method solves the standard SVM formulation, as \ndoes SVMlight, the generalization properties of these \ntwo approaches are identical and we do not discuss them in \nthe paper.", + "authors": [ + "Katya Scheinberg" + ], + "id": "scheinberg06a", + "issue": 79, + "pages": [ + 2237, + 2257 + ], + "title": "An Efficient Implementation of an Active Set Method for SVMs", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/scheinberg06a/scheinberg06a.pdf b/scheinberg06a/scheinberg06a.pdf new file mode 100644 index 0000000..304a75c Binary files /dev/null and b/scheinberg06a/scheinberg06a.pdf differ diff --git a/schmitt06a/info.json b/schmitt06a/info.json new file mode 100644 index 0000000..372fc27 --- /dev/null +++ b/schmitt06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\n Fast and frugal heuristics are well studied models of bounded rationality.\n Psychological research has proposed the take-the-best heuristic as a\n successful strategy in decision making with limited resources.\n Take-the-best searches for a sufficiently good ordering of cues (or\n features) in a task where objects are to be compared lexicographically. We\n investigate the computational complexity of finding optimal cue permutations\n for lexicographic strategies and prove that the problem is NP-complete. It\n follows that no efficient (that is, polynomial-time) algorithm computes\n optimal solutions, unless P=NP. We further analyze the complexity of\n approximating optimal cue permutations for lexicographic strategies. We\n show that there is no efficient algorithm that approximates the optimum to\n within any constant factor, unless P=NP.\n

\n

\n The results have implications for the complexity of learning lexicographic\n strategies from examples. They show that learning them in polynomial time\n within the model of agnostic probably approximately correct (PAC) learning\n is impossible, unless RP=NP. We further consider greedy approaches for\n building lexicographic strategies and determine upper and lower bounds for\n the performance ratio of simple algorithms. Moreover, we present a greedy\n algorithm that performs provably better than take-the-best. Tight bounds on\n the sample complexity for learning lexicographic strategies are also given\n in this article.\n

", + "authors": [ + "Michael Schmitt", + "Laura Martignon" + ], + "id": "schmitt06a", + "issue": 2, + "pages": [ + 55, + 83 + ], + "title": "On the Complexity of Learning Lexicographic Strategies", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/schmitt06a/schmitt06a.pdf b/schmitt06a/schmitt06a.pdf new file mode 100644 index 0000000..7ba902f Binary files /dev/null and b/schmitt06a/schmitt06a.pdf differ diff --git a/schraudolph06a/info.json b/schraudolph06a/info.json new file mode 100644 index 0000000..91dea89 --- /dev/null +++ b/schraudolph06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "This paper presents an online support vector machine (SVM) that uses\nthe stochastic meta-descent (SMD) algorithm to adapt its step size\nautomatically. We formulate the online learning problem as a\nstochastic gradient descent in reproducing kernel Hilbert space\n(RKHS) and translate SMD to the nonparametric setting, where its\ngradient trace parameter is no longer a coefficient vector but an\nelement of the RKHS. We derive efficient updates that allow us to\nperform the step size adaptation in linear time. We apply the\nonline SVM framework to a variety of loss functions, and in\nparticular show how to handle structured output spaces and achieve\nefficient online multiclass classification. Experiments show that\nour algorithm outperforms more primitive methods for setting the\ngradient step size.", + "authors": [ + "S. V. N. Vishwanathan", + "Nicol N. Schraudolph", + "Alex J. Smola" + ], + "id": "schraudolph06a", + "issue": 39, + "pages": [ + 1107, + 1133 + ], + "title": "Step Size Adaptation in Reproducing Kernel Hilbert Space", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/schraudolph06a/schraudolph06a.pdf b/schraudolph06a/schraudolph06a.pdf new file mode 100644 index 0000000..eff3a26 Binary files /dev/null and b/schraudolph06a/schraudolph06a.pdf differ diff --git a/scott06a/info.json b/scott06a/info.json new file mode 100644 index 0000000..f8b6418 --- /dev/null +++ b/scott06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Given a probability measure P and a reference measure\nμ, one is often interested in the minimum μ-measure set\nwith P-measure at least α. Minimum volume sets of this\ntype summarize the regions of greatest probability mass of P,\nand are useful for detecting anomalies and constructing confidence\nregions. This paper addresses the problem of estimating minimum\nvolume sets based on independent samples distributed according to\nP. Other than these samples, no other information is available\nregarding P, but the reference measure μ is assumed to be\nknown. We introduce rules for estimating minimum volume sets that\nparallel the empirical risk minimization and structural risk\nminimization principles in classification. As in classification, we\nshow that the performances of our estimators are controlled by the\nrate of uniform convergence of empirical to true probabilities over\nthe class from which the estimator is drawn. Thus we obtain finite\nsample size performance bounds in terms of VC dimension and related\nquantities. We also demonstrate strong universal consistency, an\noracle inequality, and rates of convergence. The proposed estimators\nare illustrated with histogram and decision tree set estimation\nrules.", + "authors": [ + "Clayton D. Scott", + "Robert D. Nowak" + ], + "id": "scott06a", + "issue": 23, + "pages": [ + 665, + 704 + ], + "title": "Learning Minimum Volume Sets", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/scott06a/scott06a.pdf b/scott06a/scott06a.pdf new file mode 100644 index 0000000..7867b8b Binary files /dev/null and b/scott06a/scott06a.pdf differ diff --git a/shalev-shwartz06a/info.json b/shalev-shwartz06a/info.json new file mode 100644 index 0000000..d23c72e --- /dev/null +++ b/shalev-shwartz06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We discuss the problem of learning to rank labels from a real valued\nfeedback associated with each label. We cast the feedback as a\npreferences graph where the nodes of the graph are the labels and\nedges express preferences over labels. We tackle the learning problem\nby defining a loss function for comparing a predicted graph with a\nfeedback graph. This loss is materialized by decomposing the feedback\ngraph into bipartite sub-graphs. We then adopt the maximum-margin\nframework which leads to a quadratic optimization problem with linear\nconstraints. While the size of the problem grows quadratically with\nthe number of the nodes in the feedback graph, we derive a problem of\na significantly smaller size and prove that it attains the same\nminimum. We then describe an efficient algorithm, called SOPOPO, for\nsolving the reduced problem by employing a soft projection onto the\npolyhedron defined by a reduced set of constraints. We also describe\nand analyze a wrapper procedure for batch learning when multiple\ngraphs are provided for training. We conclude with a set of\nexperiments which show significant improvements in run time over a\nstate of the art interior-point algorithm.", + "authors": [ + "Shai Shalev-Shwartz", + "Yoram Singer" + ], + "id": "shalev-shwartz06a", + "issue": 57, + "pages": [ + 1567, + 1599 + ], + "title": "Efficient Learning of Label Ranking by Soft Projections onto Polyhedra", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/shalev-shwartz06a/shalev-shwartz06a.pdf b/shalev-shwartz06a/shalev-shwartz06a.pdf new file mode 100644 index 0000000..4407cfd Binary files /dev/null and b/shalev-shwartz06a/shalev-shwartz06a.pdf differ diff --git a/shimizu06a/info.json b/shimizu06a/info.json new file mode 100644 index 0000000..fe6a659 --- /dev/null +++ b/shimizu06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "In recent years, several methods have been proposed for the discovery\nof causal structure from non-experimental data. Such methods make\nvarious assumptions on the data generating process to facilitate its\nidentification from purely observational data. Continuing this line of\nresearch, we show how to discover the complete causal structure of\ncontinuous-valued data, under the assumptions that (a) the data\ngenerating process is linear, (b) there are no unobserved confounders,\nand (c) disturbance variables have non-Gaussian distributions of\nnon-zero variances. The solution relies on the use of the statistical\nmethod known as independent component analysis, and does not require\nany pre-specified time-ordering of the variables. We provide a\ncomplete Matlab package for performing this LiNGAM analysis (short for\nLinear Non-Gaussian Acyclic Model), and demonstrate the effectiveness\nof the method using artificially generated data and real-world data.", + "authors": [ + "Shohei Shimizu", + "Patrik O. Hoyer", + "Aapo Hyvärinen", + "Antti Kerminen" + ], + "id": "shimizu06a", + "issue": 71, + "pages": [ + 2003, + 2030 + ], + "title": "A Linear Non-Gaussian Acyclic Model for Causal Discovery", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/shimizu06a/shimizu06a.pdf b/shimizu06a/shimizu06a.pdf new file mode 100644 index 0000000..2e807ea Binary files /dev/null and b/shimizu06a/shimizu06a.pdf differ diff --git a/shivaswamy06a/info.json b/shivaswamy06a/info.json new file mode 100644 index 0000000..27081fa --- /dev/null +++ b/shivaswamy06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We propose a novel second order cone programming formulation for\ndesigning robust classifiers which can handle uncertainty in\nobservations. Similar formulations are also derived for designing\nregression functions which are robust to uncertainties in the\nregression setting. The proposed formulations are independent of the\nunderlying distribution, requiring only the existence of second\norder moments. These formulations are then specialized to the case\nof missing values in observations for both classification and\nregression problems. Experiments show that the proposed\nformulations outperform imputation.", + "authors": [ + "Pannagadatta K. Shivaswamy", + "Chiranjib Bhattacharyya", + "Alexander J. Smola" + ], + "id": "shivaswamy06a", + "issue": 46, + "pages": [ + 1283, + 1314 + ], + "title": "Second Order Cone Programming Approaches for Handling Missing and Uncertain Data", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/shivaswamy06a/shivaswamy06a.pdf b/shivaswamy06a/shivaswamy06a.pdf new file mode 100644 index 0000000..d3c2dd9 Binary files /dev/null and b/shivaswamy06a/shivaswamy06a.pdf differ diff --git a/silva06a/info.json b/silva06a/info.json new file mode 100644 index 0000000..431658c --- /dev/null +++ b/silva06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "We describe anytime search procedures that (1) find disjoint subsets\nof recorded variables for which the members of each subset are\nd-separated by a single common unrecorded cause, if such exists; (2)\nreturn information about the causal relations among the latent factors\nso identified. We prove the procedure is point-wise consistent\nassuming (a) the causal relations can be represented by a directed\nacyclic graph (DAG) satisfying the Markov Assumption and the\nFaithfulness Assumption; (b) unrecorded variables are not caused by\nrecorded variables; and (c) dependencies are linear. We compare the procedure with\nstandard approaches over a variety of simulated structures and sample sizes, and\nillustrate its practical value with brief studies of social science\ndata sets. Finally, we consider generalizations for non-linear\nsystems.", + "authors": [ + "Ricardo Silva", + "Richard Scheine", + "Clark Glymour", + "Peter Spirtes" + ], + "id": "silva06a", + "issue": 7, + "pages": [ + 191, + 246 + ], + "title": "Learning the Structure of Linear Latent Variable Models", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/silva06a/silva06a.pdf b/silva06a/silva06a.pdf new file mode 100644 index 0000000..655c0cf Binary files /dev/null and b/silva06a/silva06a.pdf differ diff --git a/singliar06a/info.json b/singliar06a/info.json new file mode 100644 index 0000000..7c9719f --- /dev/null +++ b/singliar06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We develop a new component analysis framework, the Noisy-Or\nComponent Analyzer (NOCA), that targets high-dimensional binary\ndata. NOCA is a probabilistic latent variable model that assumes the\nexpression of observed high-dimensional binary data is driven by a\nsmall number of hidden binary sources combined via noisy-or units.\nThe component analysis procedure is equivalent to learning of NOCA\nparameters. Since the classical EM formulation of the NOCA learning\nproblem is intractable, we develop its variational approximation. We\ntest the NOCA framework on two problems: (1) a synthetic\nimage-decomposition problem and (2) a co-citation data analysis\nproblem for thousands of CiteSeer documents. We demonstrate good\nperformance of the new model on both problems. In addition, we\ncontrast the model to two mixture-based latent-factor models: the\nprobabilistic latent semantic analysis (PLSA) and latent Dirichlet\nallocation (LDA). Differing assumptions underlying these models cause\nthem to discover different types of structure in co-citation data,\nthus illustrating the benefit of NOCA in building our understanding of\nhigh-dimensional data sets.", + "authors": [ + "Tom{{\\'a}}{\\v{s}} Šingliar", + "Milo{\\v{s}} Hauskrecht" + ], + "id": "singliar06a", + "issue": 77, + "pages": [ + 2189, + 2213 + ], + "title": "Noisy-OR Component Analysis and its Application to Link Analysis", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/singliar06a/singliar06a.pdf b/singliar06a/singliar06a.pdf new file mode 100644 index 0000000..13cd555 Binary files /dev/null and b/singliar06a/singliar06a.pdf differ diff --git a/sonnenburg06a/info.json b/sonnenburg06a/info.json new file mode 100644 index 0000000..a4f951e --- /dev/null +++ b/sonnenburg06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "While classical kernel-based learning algorithms are based on a single\nkernel, in practice it is often desirable to use multiple kernels.\nLanckriet et al. (2004) considered conic combinations of kernel\nmatrices for classification, leading to a convex quadratically\nconstrained quadratic program. We show that it can be rewritten as a\nsemi-infinite linear program that can be efficiently solved by\nrecycling the standard SVM implementations. Moreover, we generalize\nthe formulation and our method to a larger class of problems,\nincluding regression and one-class classification. Experimental\nresults show that the proposed algorithm works for hundred thousands of examples or\nhundreds of kernels to be combined, and helps for automatic model\nselection, improving the interpretability of the learning result. In a \nsecond part we discuss general speed up mechanism for\nSVMs, especially when used with sparse feature maps as appear\nfor string kernels, allowing us to train a string kernel SVM on a 10\nmillion real-world splice data set from computational biology. We\nintegrated multiple kernel learning in our machine learning toolbox\nSHOGUN for which the source code is publicly available \nat http://www.fml.tuebingen.mpg.de/raetsch/projects/shogun.", + "authors": [ + "S{{\\\"o}}ren Sonnenburg", + "Gunnar Rätsch", + "Christin Schäfer", + "Bernhard Sch{{\\\"o}}lkopf" + ], + "id": "sonnenburg06a", + "issue": 56, + "pages": [ + 1531, + 1565 + ], + "title": "Large Scale Multiple Kernel Learning", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/sonnenburg06a/sonnenburg06a.pdf b/sonnenburg06a/sonnenburg06a.pdf new file mode 100644 index 0000000..9859e61 Binary files /dev/null and b/sonnenburg06a/sonnenburg06a.pdf differ diff --git a/spratling06a/info.json b/spratling06a/info.json new file mode 100644 index 0000000..7531c45 --- /dev/null +++ b/spratling06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "In order to perform object recognition it is necessary to learn representations\nof the underlying components of images. Such components correspond to objects,\nobject-parts, or features. Non-negative matrix factorisation is a generative\nmodel that has been specifically proposed for finding such meaningful\nrepresentations of image data, through the use of non-negativity constraints on\nthe factors. This article reports on an empirical investigation of the\nperformance of non-negative matrix factorisation algorithms. It is found that\nsuch algorithms need to impose additional constraints on the sparseness of the\nfactors in order to successfully deal with occlusion. However, these constraints\ncan themselves result in these algorithms failing to identify image components\nunder certain conditions. In contrast, a recognition model (a competitive\nlearning neural network algorithm) reliably and accurately learns\nrepresentations of elementary image features without such constraints.", + "authors": [ + "Michael W. Spratling" + ], + "id": "spratling06a", + "issue": 27, + "pages": [ + 793, + 815 + ], + "title": "Learning Image Components for Object Recognition", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/spratling06a/spratling06a.pdf b/spratling06a/spratling06a.pdf new file mode 100644 index 0000000..9730843 Binary files /dev/null and b/spratling06a/spratling06a.pdf differ diff --git a/sugiyama06a/info.json b/sugiyama06a/info.json new file mode 100644 index 0000000..70b6423 --- /dev/null +++ b/sugiyama06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "The goal of active learning is to determine the locations of training\ninput points so that the generalization error is minimized. We\ndiscuss the problem of active learning in linear regression scenarios.\nTraditional active learning methods using least-squares learning often\nassume that the model used for learning is correctly specified. In\nmany practical situations, however, this assumption may not be\nfulfilled. Recently, active learning methods using\n\"importance\"-weighted least-squares learning have been proposed, which\nare shown to be robust against misspecification of models. In this\npaper, we propose a new active learning method also using the weighted\nleast-squares learning, which we call ALICE (Active Learning\nusing the Importance-weighted least-squares learning based on\nConditional Expectation of the generalization error). An important\ndifference from existing methods is that we predict the\nconditional expectation of the generalization error given\ntraining input points, while existing methods predict the full\nexpectation of the generalization error. Due to this difference, the\ntraining input design can be fine-tuned depending on the realization\nof training input points. Theoretically, we prove that the proposed\nactive learning criterion is a more accurate predictor of the\nsingle-trial generalization error than the existing criterion.\nNumerical studies with toy and benchmark data sets show that the\nproposed method compares favorably to existing methods.", + "authors": [ + "Masashi Sugiyama" + ], + "id": "sugiyama06a", + "issue": 5, + "pages": [ + 141, + 166 + ], + "title": "Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/sugiyama06a/sugiyama06a.pdf b/sugiyama06a/sugiyama06a.pdf new file mode 100644 index 0000000..9c7c3f3 Binary files /dev/null and b/sugiyama06a/sugiyama06a.pdf differ diff --git a/takeuchi06a/info.json b/takeuchi06a/info.json new file mode 100644 index 0000000..ad74437 --- /dev/null +++ b/takeuchi06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "In regression, the desired estimate of y|x is not always given by a\n conditional mean, although this is most common. Sometimes one wants to\n obtain a good estimate that satisfies the property that a proportion,\n τ, of y|x, will be below the estimate. For τ = 0.5 this is\n an estimate of the median. What might be called median\n regression, is subsumed under the term quantile regression. We\n present a nonparametric version of a quantile estimator, which can be\n obtained by solving a simple quadratic programming problem and provide\n uniform convergence statements and bounds on the quantile property of\n our estimator. Experimental results show the feasibility of the\n approach and competitiveness of our method with existing ones. We\n discuss several types of extensions including an approach to solve the\n quantile crossing problems, as well as a method to incorporate\n prior qualitative knowledge such as monotonicity constraints.", + "authors": [ + "Ichiro Takeuchi", + "Quoc V. Le", + "Timothy D. Sears", + "Alexander J. Smola" + ], + "id": "takeuchi06a", + "issue": 44, + "pages": [ + 1231, + 1264 + ], + "title": "Nonparametric Quantile Estimation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/takeuchi06a/takeuchi06a.pdf b/takeuchi06a/takeuchi06a.pdf new file mode 100644 index 0000000..0d81272 Binary files /dev/null and b/takeuchi06a/takeuchi06a.pdf differ diff --git a/taskar06a/info.json b/taskar06a/info.json new file mode 100644 index 0000000..2dd565e --- /dev/null +++ b/taskar06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "We present a simple and scalable algorithm for maximum-margin\nestimation of structured output models, including an important\nclass of Markov networks and combinatorial models. We formulate\nthe estimation problem as a convex-concave saddle-point problem\nthat allows us to use simple projection methods based on the\ndual extragradient algorithm (Nesterov, 2003).\nThe projection step can be solved using\ndynamic programming or combinatorial algorithms for min-cost\nconvex flow, depending on the structure of the problem. We show\nthat this approach provides a memory-efficient alternative to\nformulations based on reductions to a quadratic program (QP). We\nanalyze the convergence of the method and present experiments on\ntwo very different structured prediction tasks: 3D image\nsegmentation and word alignment, illustrating the favorable\nscaling properties of our algorithm.", + "authors": [ + "Ben Taskar", + "Simon Lacoste-Julien", + "Michael I. Jordan" + ], + "id": "taskar06a", + "issue": 59, + "pages": [ + 1627, + 1653 + ], + "title": "Structured Prediction, Dual Extragradient and Bregman Projections", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/taskar06a/taskar06a.pdf b/taskar06a/taskar06a.pdf new file mode 100644 index 0000000..69dc179 Binary files /dev/null and b/taskar06a/taskar06a.pdf differ diff --git a/vert06a/info.json b/vert06a/info.json new file mode 100644 index 0000000..83619d0 --- /dev/null +++ b/vert06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "We determine the asymptotic behaviour of the function computed by\nsupport vector machines (SVM) and related algorithms that minimize a\nregularized empirical convex loss function in the reproducing kernel\nHilbert space of the Gaussian RBF kernel, in the situation where the\nnumber of examples tends to infinity, the bandwidth of the Gaussian\nkernel tends to 0, and the regularization parameter is held\nfixed. Non-asymptotic convergence bounds to this limit in the L2\nsense are provided, together with upper bounds on the classification\nerror that is shown to converge to the Bayes risk, therefore proving\nthe Bayes-consistency of a variety of methods although the\nregularization term does not vanish. These results are particularly\nrelevant to the one-class SVM, for which the regularization can not\nvanish by construction, and which is shown for the first time to be a\nconsistent density level set estimator.", + "authors": [ + "R{{\\'e}}gis Vert", + "Jean-Philippe Vert" + ], + "id": "vert06a", + "issue": 28, + "pages": [ + 817, + 854 + ], + "title": "Consistency and Convergence Rates of One-Class SVMs and Related Algorithms", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/vert06a/vert06a.pdf b/vert06a/vert06a.pdf new file mode 100644 index 0000000..9f57ec0 Binary files /dev/null and b/vert06a/vert06a.pdf differ diff --git a/wainwright06a/info.json b/wainwright06a/info.json new file mode 100644 index 0000000..a1a8848 --- /dev/null +++ b/wainwright06a/info.json @@ -0,0 +1,15 @@ +{ + "abstract": "Consider the problem of joint parameter estimation and prediction in a\nMarkov random field: that is, the model parameters are estimated on the\nbasis of an initial set of data, and then the fitted model is used to\nperform prediction (e.g., smoothing, denoising, interpolation) on a\nnew noisy observation. Working under the restriction of limited\ncomputation, we analyze a joint method in which the same convex\nvariational relaxation is used to construct an M-estimator for\nfitting parameters, and to perform approximate marginalization for the\nprediction step. The key result of this paper is that in the\ncomputation-limited setting, using an inconsistent parameter estimator\n(i.e., an estimator that returns the \"wrong\" model even in the\ninfinite data limit) is provably beneficial, since the resulting\nerrors can partially compensate for errors made by using an\napproximate prediction technique. En route to this result, we analyze\nthe asymptotic properties of M-estimators based on convex variational\nrelaxations, and establish a Lipschitz stability property that holds\nfor a broad class of convex variational methods. This stability\nresult provides additional incentive, apart from the obvious benefit\nof unique global optima, for using message-passing methods based on\nconvex variational relaxations. We show that joint\nestimation/prediction based on the reweighted sum-product algorithm\nsubstantially outperforms a commonly used heuristic based on ordinary\nsum-product.", + "authors": [ + "Martin J. Wainwright" + ], + "id": "wainwright06a", + "issue": 65, + "pages": [ + 1829, + 1859 + ], + "title": "Estimating the ``Wrong'' Graphical Model: Benefits in the Computation-Limited Setting", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/wainwright06a/wainwright06a.pdf b/wainwright06a/wainwright06a.pdf new file mode 100644 index 0000000..5ca7913 Binary files /dev/null and b/wainwright06a/wainwright06a.pdf differ diff --git a/watanabe06a/info.json b/watanabe06a/info.json new file mode 100644 index 0000000..54f3e4a --- /dev/null +++ b/watanabe06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nBayesian learning has been widely used and proved to be effective in many\n data modeling problems. However, computations involved in it require\n huge costs and generally cannot be performed exactly. The variational \nBayesian approach, proposed as an approximation of Bayesian learning, \nhas provided computational tractability and good generalization \nperformance in many applications. \n

\n The properties and capabilities of variational Bayesian learning itself have not\n been clarified yet. It is still unknown how good approximation the\n variational Bayesian approach can achieve. In this paper, we discuss \nvariational Bayesian learning of Gaussian\n mixture models and derive upper and lower bounds of variational \nstochastic complexities. The variational stochastic complexity, \nwhich corresponds to the minimum variational free energy and a lower \nbound of the Bayesian evidence, not only becomes important in\n addressing the model selection problem, but also enables us to discuss the\n accuracy of the variational Bayesian approach as an approximation of \ntrue Bayesian learning.\n

", + "authors": [ + "Kazuho Watanabe", + "Sumio Watanabe" + ], + "id": "watanabe06a", + "issue": 21, + "pages": [ + 625, + 644 + ], + "title": "Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/watanabe06a/watanabe06a.pdf b/watanabe06a/watanabe06a.pdf new file mode 100644 index 0000000..3b004c9 Binary files /dev/null and b/watanabe06a/watanabe06a.pdf differ diff --git a/whiteson06a/info.json b/whiteson06a/info.json new file mode 100644 index 0000000..bf90841 --- /dev/null +++ b/whiteson06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "Temporal difference methods are theoretically grounded and empirically\neffective methods for addressing reinforcement learning problems.\nIn most real-world reinforcement learning tasks, TD methods require\na function approximator to represent the value function. However,\nusing function approximators requires manually making crucial\nrepresentational decisions. This paper investigates\nevolutionary function approximation, a novel approach to\nautomatically selecting function approximator representations that\nenable efficient individual learning. This method evolves\nindividuals that are better able to learn. We present a\nfully implemented instantiation of evolutionary function\napproximation which combines NEAT, a neuroevolutionary optimization\ntechnique, with Q-learning, a popular TD method. The resulting\nNEAT+Q algorithm automatically discovers effective representations\nfor neural network function approximators. This paper also presents\non-line evolutionary computation, which improves the on-line\nperformance of evolutionary computation by borrowing selection\nmechanisms used in TD methods to choose individual actions and using\nthem in evolutionary computation to select policies for evaluation.\nWe evaluate these contributions with extended empirical studies in\ntwo domains: 1) the mountain car task, a standard reinforcement\nlearning benchmark on which neural network function approximators\nhave previously performed poorly and 2) server job scheduling, a\nlarge probabilistic domain drawn from the field of autonomic\ncomputing. The results demonstrate that evolutionary function\napproximation can significantly improve the performance of TD\nmethods and on-line evolutionary computation can significantly\nimprove evolutionary methods. This paper also presents additional\ntests that offer insight into what factors can make neural network\nfunction approximation difficult in practice.", + "authors": [ + "Shimon Whiteson", + "Peter Stone" + ], + "id": "whiteson06a", + "issue": 30, + "pages": [ + 877, + 917 + ], + "title": "Evolutionary Function Approximation for Reinforcement Learning", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/whiteson06a/whiteson06a.pdf b/whiteson06a/whiteson06a.pdf new file mode 100644 index 0000000..6f7c1c8 Binary files /dev/null and b/whiteson06a/whiteson06a.pdf differ diff --git a/wright06a/info.json b/wright06a/info.json new file mode 100644 index 0000000..75ee174 --- /dev/null +++ b/wright06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "Several fundamental security mechanisms for restricting access to\nnetwork resources rely on the ability of a reference monitor to\ninspect the contents of traffic as it traverses the network. However,\nwith the increasing popularity of cryptographic protocols, the\ntraditional means of inspecting packet contents to enforce security\npolicies is no longer a viable approach as message contents are\nconcealed by encryption. In this paper, we investigate the extent to\nwhich common application protocols can be identified using only the\nfeatures that remain intact after encryption---namely packet size,\ntiming, and direction. We first present what we believe to be the\nfirst exploratory look at protocol identification in encrypted tunnels\nwhich carry traffic from many TCP connections simultaneously, using\nonly post-encryption observable features. We then explore the problem\nof protocol identification in individual encrypted TCP connections,\nusing much less data than in other recent approaches. The results of\nour evaluation show that our classifiers achieve accuracy greater than\n90% for several protocols in aggregate traffic, and, for most\nprotocols, greater than 80% when making fine-grained classifications\non single connections. Moreover, perhaps most surprisingly, we show\nthat one can even estimate the number of live connections in certain\nclasses of encrypted tunnels to within, on average, better than 20%.", + "authors": [ + "Charles V. Wright", + "Fabian Monrose", + "Gerald M. Masson" + ], + "id": "wright06a", + "issue": 99, + "pages": [ + 2745, + 2769 + ], + "title": "On Inferring Application Protocol Behaviors in Encrypted Network Traffic", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/wright06a/wright06a.pdf b/wright06a/wright06a.pdf new file mode 100644 index 0000000..f2ad8a6 Binary files /dev/null and b/wright06a/wright06a.pdf differ diff --git a/wu06a/info.json b/wu06a/info.json new file mode 100644 index 0000000..88d3304 --- /dev/null +++ b/wu06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "Many kernel learning algorithms, including support vector machines,\nresult in a kernel machine, such as a kernel classifier, whose key\ncomponent is a weight vector in a feature space implicitly introduced\nby a positive definite kernel function. This weight vector is usually\nobtained by solving a convex optimization problem. Based on this fact\nwe present a direct method to build sparse kernel learning algorithms\nby adding one more constraint to the original convex optimization\nproblem, such that the sparseness of the resulting kernel machine is\nexplicitly controlled while at the same time performance is kept as\nhigh as possible. A gradient based approach is provided to solve this\nmodified optimization problem. Applying this method to the support\nvectom machine results in a concrete algorithm for building sparse \nlarge margin classifiers. These classifiers essentially find a discriminating\nsubspace that can be spanned by a small number of vectors, and in this\nsubspace, the different classes of data are linearly well\nseparated. Experimental results over several classification benchmarks\ndemonstrate the effectiveness of our approach.", + "authors": [ + "Mingrui Wu", + "Bernhard Sch{{\\\"o}}lkopf", + "G{{\\\"o}}khan Bakir" + ], + "id": "wu06a", + "issue": 20, + "pages": [ + 603, + 624 + ], + "title": "A Direct Method for Building Sparse Kernel Learning Algorithms", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/wu06a/wu06a.pdf b/wu06a/wu06a.pdf new file mode 100644 index 0000000..5011d09 Binary files /dev/null and b/wu06a/wu06a.pdf differ diff --git a/yanover06a/info.json b/yanover06a/info.json new file mode 100644 index 0000000..11d1410 --- /dev/null +++ b/yanover06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "

\nThe problem of finding the most probable (MAP) configuration in\ngraphical models comes up in a wide range of applications. In a\ngeneral graphical model this problem is NP hard, but various\napproximate algorithms have been developed. Linear programming (LP)\nrelaxations are a standard method in computer science for\napproximating combinatorial problems and have been used for finding\nthe most probable assignment in small graphical models. However,\napplying this powerful method to real-world problems is extremely\nchallenging due to the large numbers of variables and constraints in\nthe linear program. Tree-Reweighted Belief Propagation is a promising\nrecent algorithm for solving LP relaxations, but little is known about\nits running time on large problems.\n

\n

\nIn this paper we compare tree-reweighted belief propagation (TRBP) and powerful\ngeneral-purpose LP solvers (CPLEX) on relaxations of real-world graphical\nmodels from the fields of computer vision and computational biology. We find\nthat TRBP almost always finds the solution significantly faster than all the\nsolvers in CPLEX and more importantly, TRBP can be applied to large scale\nproblems for which the solvers in CPLEX cannot be applied. Using TRBP we can\nfind the MAP configurations in a matter of minutes for a large range of real\nworld problems.\n

", + "authors": [ + "Chen Yanover", + "Talya Meltzer", + "Yair Weiss" + ], + "id": "yanover06a", + "issue": 67, + "pages": [ + 1887, + 1907 + ], + "title": "Linear Programming Relaxations and Belief Propagation -- An Empirical Study", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/yanover06a/yanover06a.pdf b/yanover06a/yanover06a.pdf new file mode 100644 index 0000000..cf02520 Binary files /dev/null and b/yanover06a/yanover06a.pdf differ diff --git a/ye06a/info.json b/ye06a/info.json new file mode 100644 index 0000000..26d4952 --- /dev/null +++ b/ye06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nDimensionality reduction is an important pre-processing step in many\napplications. Linear discriminant analysis (LDA) is a classical\nstatistical approach for supervised dimensionality reduction. It aims\nto maximize the ratio of the between-class distance to the\nwithin-class distance, thus maximizing the class discrimination. It\nhas been used widely in many applications. However, the classical LDA\nformulation requires the nonsingularity of the scatter matrices\ninvolved. For undersampled problems, where the data dimensionality is\nmuch larger than the sample size, all scatter matrices are singular\nand classical LDA fails. Many extensions, including null space LDA\n(NLDA) and orthogonal LDA (OLDA), have been proposed in the past to\novercome this problem. NLDA aims to maximize the between-class\ndistance in the null space of the within-class scatter matrix, while\nOLDA computes a set of orthogonal discriminant vectors via the\nsimultaneous diagonalization of the scatter matrices. They have been\napplied successfully in various applications.\n

\n

\nIn this paper, we present a computational and theoretical analysis of\nNLDA and OLDA. Our main result shows that under a mild condition\nwhich holds in many applications involving high-dimensional data, NLDA\nis equivalent to OLDA. We have performed extensive experiments on\nvarious types of data and results are consistent with our theoretical\nanalysis. We further apply the regularization to OLDA. The algorithm\nis called regularized OLDA (or ROLDA for short). An efficient\nalgorithm is presented to estimate the regularization value in ROLDA.\nA comparative study on classification shows that ROLDA is very\ncompetitive with OLDA. This confirms the effectiveness of the\nregularization in ROLDA.\n

", + "authors": [ + "Jieping Ye", + "Tao Xiong" + ], + "id": "ye06a", + "issue": 42, + "pages": [ + 1183, + 1204 + ], + "title": "Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/ye06a/ye06a.pdf b/ye06a/ye06a.pdf new file mode 100644 index 0000000..7ab2ad5 Binary files /dev/null and b/ye06a/ye06a.pdf differ diff --git a/zanni06a/info.json b/zanni06a/info.json new file mode 100644 index 0000000..55af388 --- /dev/null +++ b/zanni06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "Parallel software for solving the quadratic program arising in training\nsupport vector machines for classification problems is introduced.\nThe software implements an iterative decomposition technique and exploits\nboth the storage and the computing resources\navailable on multiprocessor systems, by distributing\nthe heaviest computational tasks of each decomposition iteration.\nBased on a wide range of recent theoretical advances,\nrelevant decomposition issues, such as the quadratic\nsubproblem solution, the gradient updating, the working set selection,\nare systematically described and\ntheir careful combination to get an\neffective parallel tool is discussed.\nA comparison with state-of-the-art packages on benchmark problems\ndemonstrates the good accuracy and the remarkable time saving achieved\nby the proposed software. Furthermore, challenging experiments on \nreal-world data sets with millions training samples highlight \nhow the software makes\nlarge scale standard nonlinear support vector machines\neffectively tractable on common multiprocessor systems.\nThis feature is not shown by any of the available codes.", + "authors": [ + "Luca Zanni", + "Thomas Serafini", + "Gaetano Zanghirati" + ], + "id": "zanni06a", + "issue": 53, + "pages": [ + 1467, + 1492 + ], + "title": "Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/zanni06a/zanni06a.pdf b/zanni06a/zanni06a.pdf new file mode 100644 index 0000000..7ae0001 Binary files /dev/null and b/zanni06a/zanni06a.pdf differ diff --git a/zhang06a/info.json b/zhang06a/info.json new file mode 100644 index 0000000..e42ef35 --- /dev/null +++ b/zhang06a/info.json @@ -0,0 +1,17 @@ +{ + "abstract": "An ensemble is a group of learning models that jointly solve a\nproblem. However, the ensembles generated by existing techniques are\nsometimes unnecessarily large, which can lead to extra memory usage,\ncomputational costs, and occasional decreases in effectiveness. The\npurpose of ensemble pruning is to search for a good subset of ensemble\nmembers that performs as well as, or better than, the original\nensemble. This subset selection problem is a combinatorial\noptimization problem and thus finding the exact optimal solution is\ncomputationally prohibitive. Various heuristic methods have been\ndeveloped to obtain an approximate solution. However, most of the\nexisting heuristics use simple greedy search as the optimization\nmethod, which lacks either theoretical or empirical quality\nguarantees. In this paper, the ensemble subset selection problem is\nformulated as a quadratic integer programming problem. By applying\nsemi-definite programming (SDP) as a solution technique, we are able\nto get better approximate solutions. Computational experiments show\nthat this SDP-based pruning algorithm outperforms other heuristics in\nthe literature. Its application in a classifier-sharing study also\ndemonstrates the effectiveness of the method.", + "authors": [ + "Yi Zhang", + "Samuel Burer", + "W. Nick Street" + ], + "id": "zhang06a", + "issue": 47, + "pages": [ + 1315, + 1338 + ], + "title": "Ensemble Pruning Via Semi-definite Programming", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/zhang06a/zhang06a.pdf b/zhang06a/zhang06a.pdf new file mode 100644 index 0000000..d274ce4 Binary files /dev/null and b/zhang06a/zhang06a.pdf differ diff --git a/zhao06a/info.json b/zhao06a/info.json new file mode 100644 index 0000000..dc9168c --- /dev/null +++ b/zhao06a/info.json @@ -0,0 +1,16 @@ +{ + "abstract": "

\nSparsity or parsimony of statistical models is crucial for their\nproper interpretations, as in sciences and social sciences. Model\nselection is a commonly used method to find such models, but usually\ninvolves a computationally heavy combinatorial search.\nLasso (Tibshirani, 1996) is now being used as a computationally\nfeasible alternative to model selection. Therefore it is important\nto study Lasso for model selection purposes.\n

\nIn this paper, we prove that a single condition, which we call the\nIrrepresentable Condition, is almost necessary and sufficient for\nLasso to select the true model both in the classical fixed p \nsetting and in the large p setting as the sample size n \ngets large. Based on these results, sufficient\nconditions that are verifiable in practice are given to relate to\nprevious works and help applications of Lasso for feature selection\nand sparse representation.\n

\nThis Irrepresentable Condition, which depends mainly on the\ncovariance of the predictor variables, states that Lasso selects the\ntrue model consistently if and (almost) only if the predictors that\nare not in the true model are \"irrepresentable\" (in a sense to be\nclarified) by predictors that are in the true model. Furthermore,\nsimulations are carried out to provide insights and understanding of\nthis result.\n

", + "authors": [ + "Peng Zhao", + "Bin Yu" + ], + "id": "zhao06a", + "issue": 89, + "pages": [ + 2541, + 2563 + ], + "title": "On Model Selection Consistency of Lasso", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/zhao06a/zhao06a.pdf b/zhao06a/zhao06a.pdf new file mode 100644 index 0000000..7d5d824 Binary files /dev/null and b/zhao06a/zhao06a.pdf differ diff --git a/zhou06a/info.json b/zhou06a/info.json new file mode 100644 index 0000000..69f2be2 --- /dev/null +++ b/zhou06a/info.json @@ -0,0 +1,18 @@ +{ + "abstract": "In streamwise feature selection, new features are sequentially\nconsidered for addition to a predictive model. When the space of\npotential features is large, streamwise feature selection offers\nmany advantages over traditional feature selection methods, which\nassume that all features are known in advance. Features can be\ngenerated dynamically, focusing the search for new features on\npromising subspaces, and overfitting can be controlled by\ndynamically adjusting the threshold for adding features to the\nmodel. In contrast to traditional forward feature selection\nalgorithms such as stepwise regression in which at each step all\npossible features are evaluated and the best one is selected,\nstreamwise feature selection only evaluates each feature once when\nit is generated. We describe information-investing and\nα-investing, two adaptive complexity penalty methods for\nstreamwise feature selection which dynamically adjust the threshold\non the error reduction required for adding a new feature. These two\nmethods give false discovery rate style guarantees against\noverfitting. They differ from standard penalty methods such as AIC,\nBIC and RIC, which always drastically over- or under-fit in the\nlimit of infinite numbers of non-predictive features. Empirical\nresults show that streamwise regression is competitive with (on\nsmall data sets) and superior to (on large data sets) much more\ncompute-intensive feature selection methods such as stepwise\nregression, and allows feature selection on problems with millions\nof potential features.", + "authors": [ + "Jing Zhou", + "Dean P. Foster", + "Robert A. Stine", + "Lyle H. Ungar" + ], + "id": "zhou06a", + "issue": 66, + "pages": [ + 1861, + 1885 + ], + "title": "Streamwise Feature Selection", + "volume": "7", + "year": "2006" +} \ No newline at end of file diff --git a/zhou06a/zhou06a.pdf b/zhou06a/zhou06a.pdf new file mode 100644 index 0000000..a093ab1 Binary files /dev/null and b/zhou06a/zhou06a.pdf differ