Data Structure: Overview

[ Home Page ] [ Back ] [ Content of Contributions ] [ Movement: Green Leaf World ]

----------------------------------------------------------------------------------------------------------

In the last article, we have finished with a list of data structures. In this article, we will give a closer look on each of them briefly. In this article we are not going to look into the matter of data. Our concentration will be limited to the basic structure of each of the data structure listed in the previous article.

What was the list? can you remember? No! Ok, Have a look please....

ARRAY

Array is a linear, contiguous, static data structure to represent homogeneous elements. Array is also called sub-scripted variable. Array is randomly accessible.

The above statement looks very simple, but, it is not. There are five terms, we need to understand precisely. Before going to the details of the terms, we must know what happens, when an array of 10 integers is declared in C programming language.

C-Definition: int Arr[10];

In the above statement, it is being declared that,
As per Memory View: "Allocate ( 10*Word_size_in_Byte(int) ) Bytes of consecutive memory cells and the Base Address will be stored in Arr". It seems to be harder? Yes, quite natural. This is what happens in the memory, it is referred earlier as Memory View. Word_size_in_Byte(int) is the integer value, that indicates, number of consecutive bytes required to represent an integer value in the memory. In a 32 bit operating system, it is four bytes. So, 40 bytes of consecutive memory will be allocated by the above statement. Base Address refer to the address of the first byte allocated memory. If we assume that, the starting address of the allocated memory is 2500, then 2500 is the Base Address of the array Arr.

Or,
High Level: A set of 10 cells (read Array Cells) is allocated in the memory. This is simple to understand, but remember, it is an illusion(abstract), not the fact. actually, 40 memory cells is encoded in 10 array cell. We refer 1 array cells, but, the system works with 4 memory cell. What we uses in the program, is the logical model (array of 10 cells), which must be properly mapped into physical memory(array of 40 memory cells) before it works.

A little bit confusing? Ok have a look on the following figure of Arr.

Here, Arr[0] is the notation used to identify array cell 0, but actually Arr[0] is mapped into memory cells {2500, 2501, 2502 & 2503}. So, Arr[1] is mapped into memory cells {2504, 2505, 2506 & 2507}.

Are you thinking how it is possible? Can you thinking for a mapping function that will map Arr[0] into 2500 and Arr[1] into 2504? Yes, this is the low level Access Method.

Then what is High Level Access Method? algorithm, to access the logical(high level view) structure.

So, what is an array of 10 cells? is not a model(apart from reality, like a Taj-mahal made of soil)? where as, if we can think array of 40 (memory) cells, then it is true(physical view). We do not need to worry with 40 cells, we will handle 10 cells. This is the reason, Data Structure is called a Model.

The memory is fixed and having the structure shown above. This structure is call linearity. Linearity means one after another. Palm tree is linear but mango tree is not.

Now, we can start the main discussion,
Linear Data Structure: If we try to travel(logically) in the array, is there will be any point(0, 1, 2, 3,..) where we may have more than one alternative points ahead and we have to select on which path should we go first? No, we can see the structure, we are moving from 0 to 9, there are one and only one path always. So, it is linear.

Contiguous Data Structure: Entire 40 bytes of the memory must be consecutive, i.e. from 2500 to (2500+40-1). It is simple but remember the entire memory must be allocated at once (may at load time or during run time).

What is the size of the array declared? 10.
Always there will be 10 data? No.
What is the minimum number of data can be stored in this array? 0.
What is the maximum number of data can be stored in this array? 10.
Can we store first data in 0 and second data in 5? No.
Let, Size=10. Is Number of data present is Size? No. we have to take another value, which must be initially set to 0, as no data is present at the time of declaration.
Do you know, how much data may be encountered in future? Obviously, No, only it can be predicted a little bit, with the assumption "Prediction is subject to fail".
Can you decrease the size of the array Arr from 10 to 5, after starting of the execution of the program if I ensure you that I will give maximum 5 data? No, program already started....
Can you increase the size of the array Arr from 10 to 15, after starting of the execution of the program if I force you to insert 11th data? No, program already started...

Oh, what is going on. I am asking, I am answering. Do not get frustrated. Go and have some water or tea, its a blog just, you can read it later.

Static Data Structure: Entire 40 bytes of the memory must be declared and allocated before the use of the array. The memory may be allocated in Load time/ run time, all the properties will remain constant through out the remaining life of the program execution, except the data values. In run time allocation, once allocated, can be de-allocated entirely but can not be altered(increased/ decreased) in size, although it can be allocated in run time. So, in static data structure,

All the structural specification(like name, size, type) must be given prior(Load/run time) to use and
Can not be altered further.

If I want to do X= Arr[3]+Arr[7]; is it allowed? Yes, this is the randomness of the array structure. So, it is called randomly accessible structure.

What is Load Time declaration of an array ? int Arr[10];
What is Run Time declaration of an array ?
Arr= (int *) malloc (40);
here, Arr must be declared as (int *) earlier w.r.t this line. It is run time but static(rigid) allocation.

Oh, what is the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} used to explain the array? It is call index set defined as set of integers starting from 0(or 1) ranges upto K-1(or K), where K is the Size of the array, i.e number of array cells.

Be careful about the terms Memory Cell/ Array Cells, and Address/Index. These are not interchangeable but related terms. One is Low level term another is high level term.

Advantage of the array is its randomness.
Disadvantage is its static behavior.

We will study detail operations on array in next few articles as,

Operations on Unordered list of numeric data
Operations on Ordered list of numeric data
Operations on Ordered list of Characters (String)
Operations on two dimensional numerical data set

LINKED LIST

Linked List is a linear, non-contiguous & dynamic data structure, where values are represented in terms of node and inter-relationship among the nodes are maintained by means of link.

It is expected everyone are familiar with the concept of structure & pointer. If not, go through these two topics before proceeding to next part of this article.

See the examples in figure 2, a structure can have many fields of any types, there are no restrictions in the types of the associated fields.

In linked list, structure is identified as node, which is used to represent data. All the nodes are linked together.

See the example in figure 3, it shows the logical & physical view both.

Logically, we assumes,

There is a pointer to the structure, here LIST_Pointer, points to the first node of the list.
Each node has two fields, Data & Link. Data field is used to store value Link field is used to point to the next node of the list or NULL to indicate the last node of the list

Physically, pointing to a node means holding the base address of that node.

The main advantage of the linked list is that, node can be dynamically allocated/ de-allocated. Thus, utilization of the memory is efficient compare to the array.

The disadvantage is obviously the additional space required from links.

Depending on the link structure Linked list can classified as,

One way(Singly) linked list: as shown in the figure 3.
Two way(Doubly) linked list: each node point to the previous node in addition to next node. Previous pointer of the first node holds NULL.
Header linked list: a special node is used to hold metadata additionally
Circular linked list: last node points to the first node instead of NULL.

We will study detail operations on different linked list after array operations.

Queue & Stack

Basically, array & linked list are two only structure, using which all other data structures has been developed. Two most useful & special purpose data structures are Queue & stack.

Queue: a linear list of elements, where insertion & deletion is performed from opposite end. It is also called First In First Out(FIFO) list. It is also called, service discipline.

Stack: a linear list of elements, where insertion(called Push) & deletion(called Pop) is performed from only one end. It is also called First In Last Out(FILO) list.

Both of Queue & Stack are very useful data structure, used to solves many interesting problems.

It must be noted that, both of Queue & Stack can be implemented using array or linked list, will be discussed later.

Binary Tree

Till now we have discussed about liner data structures. Now, introducing , non-linear data structure. Before the definition, we will see an example of tree to better realize about the concept of non-linearity.

Look at the node, from a node, there are more than one links some time. So, let us consider that, at some time t, we are at the node with data 60. We want to move next, where should we go? to 70 or to 50? Is there are one alternative way or many alternative way? When, a structure demands a decision to select one out of many alternative paths, those kind of structure is known as non-linear structure.

There are so many tree structure, but for this time being we will look into (Rooted) Binary Tree structure only.

A binary tree T with N nodes can be defined as,

Either, T is empty
Or, T has a special node called Root of the tree. All the remaining (N-1) nodes will form two dis-join subset, say LST & RST, called Left Sub-Tree and Right Sub_Tree of T respectively, such that, LST & RST are both binary tree.

A pointer is used to point to the Root node of a tree, as like in the linked list.

There are so many variations in tree structure:

Full binary tree: a binary tree, with the restriction that, all the levels will contain at most data.
Complete binary tree: a binary tree, with the restriction that, all the levels will contain at most data except possibly the last level, where less number of possible data are allowed, but all data will be left oriented.
Binary Search tree: a binary tree, with the restriction that,

all the nodes in the left sub-tree of each node will contain data having value <= w.r.t that node, and
all the nodes in the right sub-tree of each node will contain data having value > w.r.t that node, and

Heap tree

Max Heap tree: a complete binary tree, with the restriction that, all the sub-tree must having the highest value in its root.
Min Heap tree: a complete binary tree, with the restriction that, all the sub-tree must having the lowest value in its root.

Balanced Binary Search tree (AVL tree): as name refers, a binary search tree, which is balanced.

We will study detail operations on different tree in different articles.

Hashing

Hashing is not any data structure, rather it is searching technique, tries to reduce searching time independent to number of data elements.

Some Related Issues

Searching Problems: related to searching of different values.
Ordering Problems: related to arrangement of values.

---------------------------------------------------------------------------------------------------------------------------

DEDICATED TO