System Design Document

 

PAID Project

15-413 Software Engineering

Fall 1998

Carnegie Mellon University

Pittsburgh, PA 15213

 


Revision History:

Version 0.9 10/13/98 Bernd Bruegge. Created
Version 0.91 11/12/98 Dan McCarriar. Integrated Sections 1-6
Version 0.92 11/19/98 Dan McCarriar. Added Sections 7 and 8, changes from Authentication team.
Version 1.0 11/25/98 Dan McCarriar. Incorporated changes from Architecture team and new diagrams.

Preface:

This document addresses the requirements of the PAID system. The intended audience for this document are the designers and the clients of the project.

PAID Members:

Bernd Bruegge, Elizabeth Bigelow, Elaine Hyder, Robin Loh, Jack Moffett, Eric Stein, Keith Arner, Swati Gupta, Russell Heywood, Joyce Johnstone, Luis Alonso, Orly Canlas, David Garmire, Jonathan Hsieh, James Lampe, Yun-Ching Lee, Wing Ling Leung, Kent Ma, Georgios Markakis, Richard Markwart, Dan McCarriar, Reynald Ong, Adam Phelps, Arnaldo Piccinelli, Euijung Ra, Qiang Rao, William Ross, Pooja Saksena, Rudy Setiawan, Timothy Shirley, Michael Smith, Barrett Trask, Ivan Tumanov, Anthony Watkins, Jonathan Wildstrom, Brian Woo, Stephane Zermatten, Andrew Zimdars

Table of Contents:

  1. Goals and Tradeoffs
  2. System Decomposition
  3. Concurrency Identification
  4. Hardware/Software Allocation
  5. Data Management
  6. Global Resource Handling
  7. Software Control Implementation
  8. Boundary Conditions
  9. Design Rationale


1. Goals and Tradeoffs

The goal of the Platform for Active Information Dissemination (PAID) project is to design an infrastructure to enable the timely distribution of information to affiliates of Daimler-Benz, Inc. Utilizing both the public Internet and private corporate networks, PAID will allow Daimler-Benz to disseminate after-sales information in a secure but robust manner to any affiliates that have a need to receive the information. This document will present the technical details of the PAID system design. More information about the specific features and the motivation for PAID can be found in the Requirements Analysis Document (RAD) and the Problem Statement.

PAID will distribute information, as required by affiliates of Daimler-Benz, in a fast and efficient manner. The system will eliminate the need for monthly information updates on CD-ROM. Once in place, PAID will significantly decrease information, entry and administrative costs. The extensible design of the system will enable the addition of new databases, data types and their distribution. PAID will have a fast response time, improving the end-user's ability to serve their customers efficiently. Information transmitted through the PAID system will be secured by means of a smart card, ensuring that the information remains confidential and that access is limited only to authorized personnel. Finally, PAID will enable affiliates of Daimler-Benz to have access to the most up-to-date information that is available, greatly improving on the current scenario where information is distributed on a monthly basis.

From a system design perspective, PAID is highly extensible. The architecture allows new databases and functionality to be added without a complete redesign of the system. PAID is also scalable, allowing for increasing volume of data transfer as well as the addition of new dealers to the network. The PAID system is designed to be location transparent. PAID will be a reliable system, within the boundaries of the reliability requirements set forth in the problem statement and specified by the client. The adaptability and learning behavior in PAID will improve the efficiency and response time of information access on the end-user systems.

There are, of course, some tradeoffs that must be considered when trying to reach these goals. First, to maintain the scalability and extensibility of PAID, it is desireable to have database independence. To allow the use of legacy data that may be stored in a large number of disparate database systems, PAID should not require data to come from a single vendor's database. Technologies such as JDBC allow easy access to nearly every type of database system that might be used with PAID. In the near term, however, there may be a performance penalty associated with the use of JDBC. This should become less of an issue as systems become more powerful and JDBC technology matures. Similarly, since the PAID system as a whole will be implemented in JAVA, there may be performance issues early on. We believe that the rapid adoption of JAVA throughout the industry and the potential that JAVA provides for reuse of objects and the extensibility of the PAID system in general outweighs these early performance problems.

The implementation of routing efficiency routines will introduce additional complexity of computation, increasing processing time on the servers that will feed information to the PAID system. Since building behavior models is a computationally intensive task, potentially impacting the speed of the PAID system when its in use, modeling tasks will take place in a separate thread which will run at an interval specified by the system administrator. While this method sacrifices accuracy as the interval between model rebuilds grows, it allows for near instantaneous processing of actions based on the most recent behavior model. We believe that the increased response time that this will afford the end-user outweighs the increase in computing power and system configuration that will be necessary for the information servers.

Finally, the smart card technology that the PAID system uses will offer enhanced security by means of a long key stored on the card. However the use of a smart card introduces complexity in some areas. In terms of authentication, the smart card is a departure from the traditional user name and password combination that is currently used. The use of a smart card will require the design of new security procedures on the end-user level, depending on the number of smart cards that will be available to each Daimler-Benz affiliate. The use of smart cards also increases the hardware requirements for the deployment of PAID as each system that will need to have access to data through the PAID system will require some piece of hardware that will read a smart card. Since security of data is critical to the success of PAID, we believe that the use of smart cards for authentication clearly outweighs the associated complexity.

The remainder of this document described the architecture of a prototype of the PAID system that can be used as the basis for future implementation of the described infrastructure.

 

2. System Specification

2.1 System Architecture

In the PAID architecture shown in Figure 3, the subsystems as a whole allow for a hierarchy of COMET (Collaborative Object Management Extensible Transporter) servers, which are used to disseminate information to the end-user systems. From the STAR Network, a group of COMET servers receive updates that are stored and distributed to servers below it in the hierarchy. Dealer servers will receive these updates and execute them on their own database. For future scalability, more COMET servers can be inserted between the dealers and the STAR network.

2.2 System Decomposition

The following subsystems were identified for the implementation of PAID functionality. Each subsystem performs a well-defined task. Subsystems interact with each other to achieve the necessary PAID tasks. The subsystems are:

2.2.1 Learning

The proposed learning subsystem will watch for "triggers" (data transactions) from the COMET Server database. The system will selectively analyze this data and determine a more intelligent, streamlined and efficient way to handle future data transactions.

2.2.2 Database

The primary purpose of the database subsystem is to engineer a way for data to be stored and replicated in an efficient manner on computers utilizing the PAID system. These include the main database servers at Daimler-Benz, COMET servers, dealership servers, and individual dealer clients. A sub-task is the storage of persistent data for other PAID subsystems, and the uploading of sales, marketing, customer and vehicle information up to the main servers.

2.2.3 Authentication

The authentication subsystem will provide secure access to the data made available by the PAID system. This will be done, in part, by restricting access on a group membership basis. In addition, encryption of data transmissions across networks will be used. Finally, a system for creating and managing users, groups, and access rights will be included.

2.2.4 User Interface

The User Interface Subsystem will provide a set of user-friendly graphical user interfaces that deal with the 8 scenarios presented in the problem statement.

2.2.5 Network

The PAID Network subsystem will be designed to provide a method of communication between a dealer and the central database, as well as providing communication between the other individual subsystems of PAID. Specifically, the Network subsystem will allow a remote user to request information from the central database, and to receive updated information from the central database.

2.2.6 Event Service

The Event Service subsystem multicasts events that are possibly of interest to all subsystems. These events include servers going up and down and notification that new updates are available.

2.3 Layers & Partitions


Figure 1: PAID subsystem dependencies

The dependencies in the PAID subsystem interactions are displayed in Figure 1. The diagram describes the dependsOn association between subsystems and the partial hierarchy inherent in some of these interactions. The dependencies are represented as arrows originating from the dependent subsystem.

This dependency diagram can also be displayed as a layered diagram.


Figure 2: Layered dependency diagram

 

2.4 System Topology

The component deployment diagram differs from COMET servers and dealer servers. The COMET servers contain the learning and event functionality which is absent form the dealer servers. The subsystem interactions are modeled below.

 


Figure 3: PAID System Topology

3. Concurrency Identification

The only concurrent functionality in the PAID subsystems occurs in the Learning subsystem. Learning uses a data miner to analyze the captured user logs and determine optimizing behaviors. This results in the only shared object in the current design, the BehaviorFile, which is updated only by the Learning subsystem and provides other subsystems with intelligent recommendations. Database locking mechanisms and transactions will handle concurrent accesses to the BehaviorFile.

All other subsystems allow for multiple users by maintaining a different session for each user. These sessions can run concurrently. The database residing on the COMET servers is capable of servicing these sessions based on built-in transaction level simultaneous access. Outside of the database, there is no shared information or variables between the sessions.

Event Service acts only as the channel for communication among the subsystems. It does not interfere with the workings of other subsystems and therefore runs concurrently with all other subsystems. Although the User Interface subsystem may broadcast events of user actions, users have no direct interaction with Event Service or any other subsystem besides User Interface or Authentication.

In summary:

 

4. Hardware/Software Allocation

Based on client feedback, our target deployment platform is Windows NT. Development occurs on both Windows NT and Linux platforms. For the software bus of the PAID system, we will be using Voyager from ObjectSpace. Voyager provides the ability for rapid prototyping and CORBA interaction.

For connection to the database, we will be using JDBC to allow for database independence.

4.1 System Performance

4.1.1 General System Performance

The query response time goal for the dealers local PAID database is under 1 second for over 90% of the requests. For any network requests, PAID aims at a maximum response time of one minute, not including time to connect to the network. However, PAID can not guarantee network response time, due to circumstances beyond our control (i.e. heavy network traffic).

Since we have not been provided with sample data and data volumes, we have been unable to determine data transaction volumes.

4.1.2 Processor & Memory Allocation

The data mining performed by the learning subsystem is processor and memory intensive and may require a separate machine. Depending on system loads, additional resources may be necessary. The Database subsystem will maintain routing tables and provide an API to the Network subsystem to access those routing tables. Network will use the routing tables to route requests to servers that have the requested information available and that are idle enough to handle the request. Without data and data volumes, it is difficult to determine specific hardware requirements.

4.2 Connectivity

The connectivity between subsystems consists of Voyager for remote method invocations between servers, RMI for remote method invocations from a client machine to a server, and JDBC from the Database subsystem to the database itself. JDBC is used to maintain database independence. In the topology diagram (see Figure 3), all unlabeled connections are made using Voyager.

4.3 Network Architecture

PAID uses the TCP/IP protocol for the communications. The Daimler-Benz extranet and the public Internet are used to achieve the connectivity between COMET servers and the dealer servers. At the dealer level, a local Ethernet or similar TCP/IP enabled network exists for connectivity between local machines and the dealer server.

 

5. Data Management

5.1 Types of Data

Some examples of data that the PAID system will deal with are:

These data will all be stored in databases. For the development phase of this project, we will be using Interbase 5.0 (running on Red Hat Linux 4.2 platform) as our database server.

5.2 Data Distribution

The data will be distributed across three types of machines. These machines can be categorized as follows:

Different types of data are distributed across the various machines in different ways.

5.3 Data Organization

The data will be divided into subsets. These subsets could be, for example, classes of vehicles, locations where vehicles were sold, or dates of sale. Each user request for data will require access to one or more of these subsets. The data subsystem will determine which subsets are needed to satisfy the requests. Each of the COMET and dealer servers will hold one or more of these subsets, and every subset will exist on at least one COMET server.

Thus, when data in a particular subset is updated, that update will be sent via the IOUs to the COMET servers that hold that subset. Then, the COMET servers will send the updates via IOU to the dealer servers, or queue them if a dealer server is off-line or otherwise unavailable.

5.4 Server Loads

In order to calculate expected loads on the servers, we have made the following simplifying assumptions:

The worst case we need to consider is the case where all 6000 dealers access the same COMET server at the same time. In the average case, a single COMET server is accessed [(# of dealers that use that server) * (how many times data is not on the dealer's machine) / (# of seconds in a day)].

5.4.1 FDOK Data Traffic

The following table shows representative figures for FDOK data traffic. The updates shown are a total over 5 weeks.

Vehicle
Area
Update Size
Total Size
# Vehicles
Utility
Europe w/o
9.8 MB
43 MB
23000
Utility
Germany
9.4 MB
31 MB
15000
Utility
Others
0.8 MB
2.1 MB
2000
Passenger
Europe w/o
2.2 MB
14.5 MB
35000
Passenger
Germany
3.8 MB
20 MB
48000
Passenger
America (n/s)
1.2 MB
10 MB
21000
Passenger
Others
0.2 MB
0.9 MB
2000

5.4.2 EPC Data Traffic

Typical traffic for the Electronic Parts Catalog would be 3 MB/week for database updates, and 7 MB/week for image updates.

5.4.3 Other Data Traffic

The number of database accesses expected per day per dealer for the other types of data are represented by the following two tables. The first table represents the number of time data is actually retrieved from the specified machine. The second table shows the number of times the specified machine is "touched" (for example, by checking to see if the data is stored on that machine. Whether or not it is found, this counts as a "touch".)

 

Database Accesses Per Dealer User Info Server Info Misc Persistent Data
Local Server 4 / day 45 / day TBD
COMET server 1 / day (update your user info) 1 / day TBD

 

Touches Per Dealer User Info Server Info Misc Persistent Data
Locally 4 / day 45 / day TBD
COMET server 1 / day (update your user info) 1 / day TBD

5.5 Archiving Requirements

The FDOK and EPC data do not have to be archived by PAID, as the archiving of these data will be handled by the STAR Network. Updates to data which are made by PAID systems need to be archived until such time as they are received by and integrated into the STAR Network.

User Info and Server Data need to be archived. Miscellaneous persistent data doesn't need to be backed up, given that we will not be backing up data on local machines.

5.6 Other Data Management Info

Data integrity is the paramount goal of the data management system. Since manual recovery of the data is extremely costly, it is imperative that we prevent data loss. PAID will use a proven database that supports JDBC, and the data management system will ensure that proper relationships within this database are maintained.

The system will be extensible in the sense that its design will allow the addition of new data types. The addition of new data types, such as non-streaming video, will not require an upgrade to the database subsystem, although upgrades to other subsystems such as User Interface may be required. This is because the database does no processing of the data. If the structure of the database changes, however, a software upgrade will be required. This extensibility allows us to easily adapt to new requirements and leverage new technologies. As a requirement for extensibility, the system will also not be tied to any particular vendor or platform.

The location of the data within the system will be transparent to the other applications. If it is remotely accessed, the physical location of the data is contained within the Server Data. The subsystem also presents a single interface to the rest of the applications that wish to access it. This interface is specified in the database API Documentation.

Data in its raw format is relational. From the perspective of the applications accessing the data, the data is object-oriented because they are returned as objects. This makes it easier to interface with Java applications.

 

6. Global Resource Handling

The PAID system is designed for global usage by different users for distinct purposes.  Therefore it is critical to organize resources in a simple and convenient manner so that the system can be used efficiently. In addition, it is critical to develop a system of authentication that guarantees secure transactions and uncompromised data.

There will be two main categories of users of the system, dealers and Daimler-Benz administrators. The dealers are made up of two subgroups, the affiliated and the non-affiliated dealers. The affiliated dealers will have access to PAID via the Daimler-Benz extranet, while the non-affiliated dealers will access COMET servers through the public Internet and will have more restricted access. Daimler-Benz administrators represent the employees who will keep the PAID system operational at the various Daimler-Benz sites. They will be responsible for keeping COMET servers running, ensuring database integrity, and handling user control and access rights.

Each PAID user will be assigned an individual Smart Card. The Smart Card will contain a user ID and a key that is unique to every user. With this data, the Smart Card acts as a normal user name/password authentication scheme. Using the Smart Card to store a password provides increased security over normal passwords since the card can store an extremely long string, which is much harder to guess.

Authentication will be implemented on an application level. Different interfaces of the application will be available to different users depending on their access rights. Each user group will have a specific interface of the application from which they can execute pre-defined queries.

There will be multiple data sources available to the end users for initial installation and updates. These data sources range from databases published on CDs or DVDs to databases containing dynamic information that reside on COMET servers. Each of these sources will be protected from unauthorized access in a way that is both secure and simple to use.

The first data sources to consider are the COMET servers that are contained in the STAR Network. These servers must be protected from unauthorized access since they contain data that is both proprietary and confidential. In addition to protecting the databases themselves from unauthorized access, the transmissions of the data from these databases to the dealerships must be protected.

Affiliated dealers and Daimler-Benz administrators will always access the COMET servers via the Daimler-Benz extranet. The security on the extranet and the users themselves are trusted. Therefore, transmissions to and from these dealers do not need to be verified by the PAID system.

When a non-affiliated dealer accesses a COMET server, their communications must be secured through encryption techniques. Kerberos will form the framework of protecting these communications. Kerberos is a trusted method of authentication, and is fast enough to be used in this system. In short, a ticket is sent to the user's machine which is then used to encrypt a message containing their username and password. If the user's login is accepted by the server, a system is set up in which all resulting communications will be encrypted. Each transmission will use a new key, making it more difficult to break the encoding of the transmissions.

The second data source to consider is the dealer's machine itself. A Daimler-Benz affiliated dealer who uses the PAID system will have the option of storing data retrieved from Daimler-Benz servers via the networks. Local data will reside in one of two places, a local store, which is a database, or in a cache, which holds data that has been downloaded but is not part of the local store. In addition to these, new installations of PAID may be run using CDs or DVDs as a data source. These sources will contain database information so access to them will be implemented in the same way as access to the dealer's local store.

When PAID is started, the user is asked to insert their Smart Card. When a card is inserted in the reader, the system will attempt to verify the user by reading the User ID and password stored on the card and matching it to a record in the local store or COMET server. This information will be used to begin the user's session on the client machine. The user will then be provided with an interface that provides them with access to those parts of the data that they are able to see. If the card is removed from the reader, the system will end the user's session.

 

7. Software Control Implementation

7.1 External Control Flow (between subsystems)

Each of the subsystems for the PAID project will run as separate processes. As such, each will run independently of the others. The inter-subsystem control flow will occur based on a mixture of remote method invocation and asynchronous events.

Authentication: The authentication subsystem responds to requests to verify a user's ability to access a certain part of the PAID application (i.e. a specific window). This will be done by exposing a method to the User Interface subsystem. Authentication will start a user's session by calling methods from the User Interface.

Database: The database subsystem responds to queries for data. The subsystem will wait in an event loop waiting for the queries either from a remote method invocation, or from events passed through the event service. When a request is received, the database will search for the appropriate information and return it if the information is available. If necessary, this may invoke the Network subsystem to retrieve information that is not stored locally.

Network: The network subsystem does not specifically interact with other subsystems. Instead, it accepts interactions from any of the other subsystems that may want to make a network transfer. Subsystems can have their data pushed or pulled through a channel to a subsystem (not necessarily the same) on another computer.

User Interface: The user interface subsystem interacts with the Authentication, Events and Database subsystems. When a user first attempts to use the client software to access data, the user interface will interact with authentication by asking if the user has the right to see the application they are trying to open. Once access rights are established, the user makes requests for information by either posting a request in an event, or by querying the database directly.

Events: The event subsystem, like the network subsystem interacts with no specific subsystem, but other subsystems interact with it. Another subsystem can subscribe to receive events from a particular channel, or publish information in a particular channel. When information is published, the event subsystem sends the information to those that have subscribed to receive the data.

Learning: Learning will make calls to the database subsystem on the COMET servers to log document downloads, record changes in user preferences, and look up downloads (for data mining) and preferences (for recommendations). Just before initiating a transfer, the network subsystem on the COMET server will ask the learning subsystem to recommend the eventual target of the document (e.g., the dealer cache or the dealer database). This information will pass to the dealer-server side.

7.2 Concurrent control

Each subsystem operates independently of each other. This requires each to operate concurrently with the others. In some cases, concurrency cannot exist due to the state of the system at that time. For more information on the concurrency of objects within each subsystem, see section 3.

7.3 Internal Control (within a single process)

Internal control in all PAID subsystems is achieved using procedure calls.

The Network, Authentication and Database subsystems do not require multiple threads beyond their event loop waiting for messages to be received. When the message is received, the subsystem may spawn a thread to handle that request in order to begin listening to requests again.

The Learning subsystem has extra thread(s) to handle the scheduling of data mining. The data mining process can be a time consuming effort and should not force requests to wait while their previous data is analyzed to determine behavior. The event loop thread in the Learning subsystem will take requests as they are received, determine how to respond (i.e. should the user "learn" something?), then respond to them. The Data mining thread analyzes the data from previous requests and makes suggestions for future behavior which is stored until the next request is received and "how to respond" needs to be determined.

The User Interface subsystem may use extra threads in the subsystem to watch for specific user interface events, such as mouse movements, key presses, etc. Once clicks or key-presses occurs, the event handler of that button will.

7.4 User Interface

Each subsystem has an event loop and user interface. The user interface of some systems may be text based for simplicity; however, the actual user interface (the interface a user and not an administrator would see) is graphical/window based. The user interfaces for the other subsystems may be provided for administrative purposes only.

 

8. Boundary Conditions

8.1 Initialization

We have to consider initialization at two different levels. The initialization of the server level (COMET Server and Dealer Servers) has to be done first. COMET Servers will be initialized by Star Network through an IOU conversion mechanism. At the Dealer Server level where our main concern rests on, CDs that contain data with tags will be inputted into a conversion program which will in turn initialize the Dealer Servers.


Figure 4: Initialization of the PAID system

 

When the system is in the steady state, in other words, when the system is up and running, the user can start a session. To start their session, the user must insert their Smart Card. Upon detecting the card, Authentication will use the information to log the user in and then call User Interface to start the needed windows. The User Interface will interact with Authentication to ensure that the user is provided with the parts of the application they are allowed to see. From these windows, the user will be able to make a request for data, which will be transferred to the Database subsystem. Database will then return the results of the query to the User Interface subsystem for display to the user.

 

 


Figure 5: Startup of a user session

 

8.2 Termination

A user session will be terminated when the Authentication subsystem detects the removal of the user's Smart Card. Authentication will tell User Interface to end the user's session.

8.3 Failure

There are several possible failures that can occur at various levels. The system will have to be designed so that it attempts to resolve possible failures. If a network server link failure occurs, the system should reroute the server with an alternate link, but if a failure occurs at the dealer LAN (Local Area Network), the resolution will be up to the individual dealers. If a local database failure occurs, a special server dedicated to update processes should provide some kind of recovery facilities. Also, a failure can occur in server database, in which case, the system should permit continuous availability by providing mirroring and replicating combined with hot incremental backups.

 

 


This page is hosted by the Chair for Applied Software Engineering of the Technische Universität München.
Imprint (Impressum)