Archive for June, 2005

Automated Problem Determination Using Call-Stack Matching

by Brodie, Mark; Ma, Sheng; Rachevsky, Leonid; Champlin, Jon

We present an architecture and algorithms for performing automated software problem determination using call-stack matching. In an environment where software is used by a large user community, the same problem may re-occur many times. We show that this can be detected by matching the program call-stack against a historical database of call-stacks, so that as soon as the problem has been resolved once, future cases of the same or similar problems can be automatically resolved. This would greatly reduce the number of cases that need to be dealt with by human support analysts. We also show how a call-stack matching algorithm can be automatically learned from a small sample of call-stacks labeled by human analysts, and examine the performance of this learning algorithm on two different data sets.

DOI: 10.1007/s10922-005-4443-8
Print publication date: 6/1/2005
View article on SpringerLink

No comments

Forthcoming Contributions

by

DOI: 10.1007/s10922-005-6426-1
Print publication date: 6/1/2005
View article on SpringerLink

No comments

CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays

by Voulgaris, Spyros; Gavidia, Daniela; Steen, Maarten

Unstructured overlays form an important class of peer-to-peer networks, notably when content-based searching is at stake. The construction of these overlays, which is essentially a membership management issue, is crucial. Ideally, the resulting overlays should have low diameter and be resilient to massive node failures, which are both characteristic properties of random graphs. In addition, they should be able to deal with a high node churn (i.e., expect high-frequency membership changes). Inexpensive membership management while retaining random-graph properties is therefore important. In this paper, we describe a novel gossip-based membership management protocol that meets these requirements. Our protocol is shown to construct graphs that have low diameter, low clustering, highly symmetric node degrees, and that are highly resilient to massive node failures. Moreover, we show that the protocol is highly reactive to restoring randomness when a large number of nodes fail.

DOI: 10.1007/s10922-005-4441-x
Print publication date: 6/1/2005
View article on SpringerLink

No comments

Characterizing and Predicting Resource Demand by Periodicity Mining

by Andrzejak, Artur; Ceyran, Mehmet

We present algorithms for characterizing the demand behavior of applications and predicting demand by mining periodicities in historical data. Our algorithms are change-adaptive, automatically adjusting to new regularities in demand patterns while maintaining low algorithm running time. They are intended for applications in scientific computing clusters, enterprise data centers, and Grid and Utility environments that exhibit periodical behavior and may benefit significantly from automation. A case study incorporating data from an enterprise data center is used to evaluate the effectiveness of our technique.

DOI: 10.1007/s10922-005-4440-y
Print publication date: 6/1/2005
View article on SpringerLink

No comments

Design and Implementation of a Resource Manager in a Distributed Database System

by Bobroff, Norman; Mummert, Lily

This paper describes a system called Trends for managing IT resources in a production server environment. The objective of Trends is to reduce operational costs associated with unplanned outages, unbalanced utilization of resources, and inconsistent service delivery. The Trends resource manager balances utilization of multiple resources such as processor and disk space, manages growth to extend resource lifetimes, and factors in variability to improve temporal stability of balancing solutions. The methodology applies to systems in which workload has a strong affinity to databases, files, or applications that can be selectively placed on one or more nodes in a distributed system. Studies in a production environment demonstrate that balancing solutions remain stable for as long as the 9–12 months covered by our data. This work takes place in the context of the Lotus Notes distributed database system, and is based on analysis and data from a production server farm hosting over 20,000 databases.

DOI: 10.1007/s10922-005-4439-4
Print publication date: 6/1/2005
View article on SpringerLink

No comments

Self-Managing Systems and Networks

by Keller, Alexander; Brunner, Marcus

DOI: 10.1007/s10922-005-4438-5
Print publication date: 6/1/2005
View article on SpringerLink

No comments

Moving or Merging IT Departments? The Hardware and Software are the Easy Parts

by Klein, Eric

Although the technical complications in moving or merging a data or telecom department are considerable, it is the human considerations that are most important and critical to the success of the change. This article shows several methods for handling these issues, with real world examples, in order to prevent problems with the change.

DOI: 10.1007/s10922-005-4437-6
Print publication date: 6/1/2005
View article on SpringerLink

No comments