Database complete chapter


                    


Data: A collection of unstructured, uninterpreted, and isolated raw facts and figures that lack any specific meaning is called data. They are gathered at random. so that the intended outcome can be achieved through processing.

Aashik , 1, Pokhara, 16

Information: Data that has been processed into information. Data are interpreted to provide meaning for the whole outcome in information. .

Aashik has rank 1 and lives in Pokhara 16.

 Problems with old type of data storing technique:

  1. Some information was stored in more than one file. (Data redundancy).
  2. Difficult to present data from the user’s view.
  3. Data security was one of the major problem.
  4. Data processing and manipulation were difficult.
  5. Data retrieval were very slow.

Therefore, the idea of a database was created to get around the aforementioned problems or constraints. A database is an arrangement of linked items or data that is tabulated and kept in one location. It provides an organization with incredibly helpful information for data manipulation and decision-making. It offers a framework for the well-organized management of massive volumes of data. A database is, in essence, a methodical and structured type of data that allows users to quickly and easily obtain or access data whenever needed. For instance, a phone book, dictionary, client information, and a marks ledger.

Roll No Name Address Contact
1 Sindhu Lamsal Pokhara 5493854
2 Sujan Pariyar Africa  3452423

Limitations of lod file storing technique:
  1.  One of the main issues with the flat file system is the data security system.
    2. Data sharing was extremely challenging or impossible.
    3. A certain record was kept in multiple files. (Redundant data)
    4. Data representation from the user's perspective is challenging.
    5. Processing complicated requests is challenging.

DBMS (Database Management System)

DBMS is a computerized record keeping system. It is a software that defines, manipulates and manage the database. It allows to access the file, update the record and retrieve data as requested. In other word, DBMS is defined as the collection of interrelated data and set of programs to access there data. The collection of data is usually a database which contains the information about any particular organization. The primary goal of DBMS is to provide an effective and efficient environment for both data retrieval and storing data in database. For example MS Access, Oracle, MY SQL, Fox-Pro, D-base etc

Advantages of DBMS

1) Data sharing: A database management system (DBMS) allows users or computers to share data that is stored in a database. For instance, bank branches exchange database data.

2) Decrease data redundancy: Repeating or duplicating the same information repeatedly is referred to as data redundancy. Such needless data repetition is decreased by DBMS.

3) Data backup and recovery: By using database management systems (DBMS), we may quickly make backup copies of original files and documents, which we can then use in the event of unintentional or deliberate loss or damage.

4) Data security: We can prevent unauthorized users from accessing databases by using DBMS. It contributes to date security.

5) Multiple user interfaces: Database management systems (DBMS) provide data sharing, allowing access to the same data from many devices and

Disadvantages of DBMS

1) Expensive: It may be expensive to run and operate DBMS for any organization.

2) Changing Technology: It is very much difficult to incorporate latest changing technology in existing system. In order to create and maintain database technical manpower and trainings are required.

3) Backup is needed: Since data maybe damage anytime accidentally or intentionally. So, it is mandatory to create a backup.

Database Model

There are different forms of Database Management system. Each characterized by the way where data are defined and structured. This arrangement of data in several structure are known as data base model.

Hierarchical database model

It's among the earliest categories of database models. The data in this model are shown as records, each of which contains several fields or attributes. Every record is set up in the database in a form like a tree. The connection between the data is known as a parent-child relationship, wherein a single child record pertains exclusively to a single parent, meaning the child solely has the property of that parent. Here, children are not allowed to use a parent's property that does not belong to them.

Advantages

  1. It is the simplest and the easiest model.
  2. It supports one to one or one to many relationship.
  3. Searching is easier and faster if parent is known.

Disadvantages

  1. It is an old fashion and outdated database model.
  2. It doesn’t support many to one relationship.
  3. It doesn’t reduce data redundancy because some data are written over different places.

Network database model

Due to certain model limitations, the hierarchical model was replaced by this network model. In the event that an employee works for two different departments, the hierarchical model will not be able to properly arrange the records. Thus, the non-hierarchical database was organized using the network database model. Database structures resemble graphs more than tree structures. A network model is made up of an assortment of records that are connected to one another through relationships. Every record has a number of fields, but only one data value for each field. This kind of model allows for the possibility of multiple parents for both parents and children.

Advantages

  1. It accepts many to many relationships. So, It is more flexible .
  2. It reduces data redundancy.
  3. This network mode is simple and easy to design.
  4. Searching is faster due to use of multi-directional pointer.

Disadvantages

  1. Needs long program to handle the relationship.
  2. Lack of structural independence.
  3. Less security

Relational database model

The relational database model arranges the data into tables with numerous rows and columns. We refer to these tables as relations. A table's rows show the relationships between a group of values. Given that a table is an assortment of these relationships. The relational database model gets its name from the mathematical term relations, which is commonly used. Another name for it is RDBMS. RDBMS is the name of the database system that keeps and shows data in a spreadsheet-like tabular format with rows and columns. These days, it is the most useful DBMS available. For instance, Oracle, MY SQL, MS-Access, etc.

Advantages

  1. There is less data redundancy.
  2. Breaking of complex database into simple is very much easier.
  3. Database processing is faster than other model.

Disadvantage

  1. Establishing more relationships complex.
  2. It requires powerful computer and data storage device.

Centralized database VS Distributed database

Centralized database

It is a basic kind that operates on a client-server model. Clients or users in this type are directly connected to the server, which is located in the center. This server helps users and clients store and retrieve data as needed by hosting their data. Small-scale businesses that don't have to deal with a lot of data and users use these kinds of systems. A centralized database is operated by a single computer that can support one or more users. Security plays a less important role here because the database is centralized. Because data are kept centrally, database maintenance is made simpler. This kind of system indicates that data access by unauthorized parties is possible.

Advantages

  1. Suitable for small scale industries.
  2. Operation and maintenance is easier.
  3. Since it prevent unauthorized person being accessed to database, it minimizes risk factor.

Disadvantages

  1. Data are not secured in this type of system.
  2. Not suitable for large scale industries.
  3. Failure of centrally located serves will collapse whole network.

Distributed database

This kind of database system has a complicated structure since it uses multiple databases and servers that are dispersed randomly across different locations to store and retrieve data rather than using a single server that is centrally located. It is made up of several logically connected databases spread across various geographic locations. Users can enjoy a good speed of bandwidth because the servers are spread out across different locations. Likewise, the process of creating backups and recovering data is much simpler there, increasing data security. Large organizations that deal with a lot of data and users from all over the world use this kind of system. Due to their distributed nature, they can present security risks and require high operating and maintenance costs.

Advantages

  1. Backup and recovery of data is easier.
  2. It can handle large volume of data and user all over the word.
  3. User can experience high speed bandwidth.

Disadvantages

  1. Very expensive to operate and maintain.
  2. Data security may be real issue.
Differences between centralized and distributed database system.
Centralized Distributed
Simple type Complex type
Located on particular location Many geographical location
Only one server Multiple server in many location
Suitable for small scale industries Suitable for large scale industries
Maintenance is easy Maintenance is difficult
Security is high Security is low
Low speed High Speed
Cheap Expensive
Failure of server affect whole network Doesn’t affect whole network
High chance of data loss Less chance of data loss

Structure query Language (SQL)

SQL stands for Structured Query Language. It is an international standard data base query language for accessing and managing data in the database. SQL was introduced and developed by IBM in early 1970’s. It was able to control relational database. SQL is not a complete programming language. It is only used for communicating with database. SQL has statement for data definition (DDL), data manipulation (DML) and data control (DCL). A query is a request to a DBMS for the retrieval, modification, insertion and deletion of the data from database.

SQL is made of three sub languages: DDL, DML and DCL

1) DDL (Data Definition Language): DDL is used by the database designer and programmers to specify the content and the structure of table. It is used to define the physical characteristics of record. It includes commands that manipulate the structure of object such as tables: For eg, to create table

Syntax:

CREATE TABLE table_name (field_name1 data_type1 field_name2 data_type2 ………);

CREATE TABLE Student (SN Number Fname text);

2) DML (Data Manipulation Language): DML is related with manipulation of records such as retrieval, sorting, display and deletion of records or data. It helps user to use query and display report of the table. It provide technique for processing the database. It includes commands like insert, delete, select, and update to manipulate the information stored in the database.

Syntax:

INSERT INTO table_name VALUES (list of values);

INSERT INTO student VALUES (1 RAM);

3) DCL (Data Control Language): All provides additional feature for security of table and database. It includes commands for controlling data and access to the database. Some of the example of this command are grant, commit, etc

Entity Relationship Database model (ER Model)

The entity relationship database model (ER diagram) is based on the perception of a real world that contains a collection of basic object called entities and relationship among these objects. The ER diagram is an overall logical structure of database that can be expressed graphically. It was developed to facilitates database design. The major objectives of ER diagram is to show relationship among different entities. It has following components.

1) Entity: The distinguishable object of this real world is known as entities. It has a set of properties which uniquely identifies an entity. For eg, if student is an entity then his/her name may be property. It is denoted by rectangle.

2) Attributes: Attributes are the properties possessed by an entity. They are represented by ellipse or oval sign. For eg, if student is an entity then its attribute can be registration number, name, roll no, class, address, etc.

3) Link: The flow of information is indicated by the link in ER diagram. It is simply denoted by a line. It is a connection of entity, attributes and relationships.

4) Relationships: A relationship is a association among several entities. It is represented by diamond. For eg, if teachers and students are two entities the association can be derived as teacher teachers students. It shows meaningful dependencies between several entities. There are 3 types of relationships. One to one. One to many, Many to many.

DBA (Database Administrator)

A DBA with solid DBMS knowledge is the most responsible person in the organization. He or she is in charge of the program's overall administration. With regard to defining the role of the employee using the system and gaining access to databases, he or she possesses the greatest number of privileges. DBAs are primarily responsible for maintaining the security, functionality, and timely delivery of information to users via the database server.

Qualities of good DBA

  1. He/she should have sound and complete knowledge about DBMS and its operation.
  2. He/she should be familiar with several DBMS packages such as MS Access, MY SQL, Oracle etc
  3. He/she should have depth knowledge about the OS in which database server is running.
  4. He/she should have good understanding of network architecture.
  5. He/she should hove good database designing skill.

Responsibilities

  1. DBA has responsibility to install, monitor, and upgrade database server.
  2. He/she should has responsibility to maintain database security by creating backup for recovery.
  3. He/she has responsibility to conduct training on the uses of database.
  4. DBA defines user privilege, relationships and manages form, reports in database.

Normalization

The process of breaking down or decomposing as complex relation into simple relation. It reduces redundancy (unnecessary repetition of data) using principle of non-loss decomposition in which table are reduce to smaller tables without loss of information.

Normalization is the database design process in which complex database table is broken down into simple separate tables. It makes data model more flexible and easier to maintain.

EF Codd has introduced few rules for normalizing the database in 1970 and these rules are known as normal forms. This process minimizes and controls the duplication of data in a database and also provides a rapid search for data from database.

Unnormalized database

Emp_code January
Emp_name February
Address March
Contact no. April
Date of birth May
Department June
Designation July
Basic_salary Daily_allowance
Travel_Allowance Gross_salary
Tax Provident _fund

Normalized database

Employee Salary Month
Emp_code Basic_Salary January
Emp_name Travel_allowance February
Address Daily_allowance March
Contact no. Gross_salary April
Date of birth Provident_funt May
Department Tax June
Designation July

Advantages of normalization

  1. It reduces data redundancy (duplication of data)
  2. It improves faster sorting and indexing.
  3. It simplifies the structure of the database table.
  4. It improves the performance of a system.
  5. It avoids loss of information.

Normal Forms

Let us consider a following unnormalized table.

Name Roll Class Sub_name Sub_marks Sub_name Sub_marks
Ram 1 11 Computer 95 Account 78
Sita 1 12 Computer 98 Account 80
Hari 2 11 Computer 80 Account 82
Shyam 2 12 Computer 92 Account 83

A) 1NF ( First Normal form)

  1. 1NF sets the very basic rules for on organized database.
  2. It eliminates duplicate columns from the same table.
  3. It creates separate tables from each group of related data and identify each row with a unique column called primary key.

The objective of 1NF is to divide the base datas into logical units called tables.

In above table, we can see that column of subject nome and marks are repeated which are eliminated in 1NF.

Name Roll Class Sub_name Sub_marks
Ram 1 11 Computer 95
Ram 1 11 Account 78
Sita 1 12 Computer 98
Sita 1 12 Account 80
Hari 2 11 Computer 80
Hari 2 11 Account 82
Shyam 2 12 Computer 92
Shyam 2 13 Account 83

B) 2NF (Second Normal form)

  1. It further addresses the concept of remaining duplicate data.
  2. It should be in first normal form.
  3. It removes the date that applies to multiple row of a table and place them in separate tables.

The objective of second NF is to take data i.e. partly dependent upon the primary key and enter the data into another table.

In above table name depends upon roll no and class, subject name only depends upon class, subject marks depends upon name and subject_name. Hence, above table can be decomposed as:

Name Roll Class
Ram 1 11
Sita 1 12
Hari 2 11
Shyam 2 12
Subject Class
Computer 11
Account 11
Computer 12
Account 12
Name Sub_name Sub_marks
Ram Computer 95
Ram Account 78
Sita Computer 98
Sita Account 80
Hari Computer 80
Hari Account 82
Shyam Computer 92
Shyam Account 83

3NF (Third Normal Form)

  1. It should be in second normal form.
  2. It removes the column that are not dependent on primary key using 3NF above table can be decomposed as:
Sub_id Subject
C1 Computer
A1 Account
Class_id Class
XI 11
XII 12
Std_id Class Roll Class_id
1 Ram 1 XI
2 Sita 1 XII
3 Hari 2 XI
4 Shyam 2 XII
Std_id Sub_id Marks
1 C1 95
1 A1 78
2 C1 98
2 A1 80
3 C1 80
3 A1 82
4 C1 92
4 A1 83

Important questions form this chapter are

1)DBMS with its objectives .

2) Differentiate between DBMS and RDBMS with example

3) Describe the centralized and distributed database model with figure.

4) What is relational database model? List out the advantage of this model.

5) What is hierarchical database model? list out its advantage and disadvantage.

6) What is normalization? Explain the normalization process with example.

7) Define DBA. what are the roles/ responsibilities and criteria of a good DBA.

8) Differentiate between database and DBMS with example.

9) write short notes on

  1. ERD
  2. SQL -> DDL, DML, DCL
  3. Data Integrity
  4. Data security
  5. Data Dictionary