Friday, June 19, 2015

A Brief Introduction to Data Structures

Česky:
Česky: (Photo credit: Wikipedia)
Hello Everyone,
 
So today I will discuss about Data structures and some other features related to it and its area of applications. Keeping the tradition in place as always, I would like you to read the following note before proceeding.

Readers Note:-
  • In this part don't expect that this is the limit of data structures as the things I will mention here are the most basic elements
  • My further tutorials on Data structure,if any, they will assume that you know at least know some programming, though the codes and explanation will be java based, but if you have a problem then post here.
  • This part will be a theoretical one.
  • Data Structures and algorithm's are two different aspects. So, don't think I will be teaching algorithms (To be true I am still not very proficient with principles of algorithm I need 5 more months to become able to teach others.)
  • Lastly and most important point do read till the end of this tutorial.
Tutorial Objectives
  • What are Data Structures ?
  • Abstract Data type (ADT).
  • Classification of Data Structures.
  • Basic Operations on Data Structures.
  • Relation between Data Structures and Algorithms
  • Basic Data Structures :- Arrays, Stack, Queue, Lists, Trees and HashTable
  • Application of Data Structures
  • Conclusion.
  • Recommended Reading.
Let's begin


What are Data Structures ?

Perhaps this the most important question in this particular tutorial. So let us define data structures.

A Data Structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. The data can be more than one variable. They also contain algorithms to process the data stored.

Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address. The implementation of a data structure usually requires writing a set of procedures that create and manipulate instances of that structure. The efficiency of a data structure cannot be analysed separately from those operations. A data structure may be defined indirectly by the operations that may be performed on it, and the mathematical properties of those operations (including space and time complexity.)

Abstract Data Type (ADT)

An abstract data type is a data type that satisfies two conditions:-
  1. The representation, or definition, of the type and the operations are contained in a single syntactic unit.
  2. The representation of objects of the type is hidden from the program units that use the type, so only direct operations possible on those objects are those provided in the type’s definition.
Another Definition :-

An abstract datatype is an encapsulation mechanism. In general it is composed of several components:-
  • A data structure or structures (often called the sorts)
  • A set of operations (called the methods or operations).
  • A precise description of the types of the methods (called a signature).
  • A precise set of rules about how it behaves (called the abstract specification or the axiomatic description).
  • An implementation hidden from the programmer who uses the data type.
Example of an ADT (I assume that you know what array means)
Array ADT

create(A,N)                   :     creates an arrayAwith storage forNitems;
bool isEmpty()                : - Retrurns true if array is empty
bool isFull()                    : - Retrurns true if array is full
int length()                    : - Returns the length of the array
void add(int item,int pos) :– Adds the element at the given position
void del(int item)            : - Deletes the given item from the array.
void display()                 : - Prints the elements of array.

Note: Stack , Queue, Lists etc are all examples of ADT.

Classification of Data Structures

Data Structures are normally divided into two broad categories :-

Primitive Data Structures

The integers, real, logical data, character data, pointer and reference are primitive data structures. Data structures that normally are directly operated upon by machine-level instructions are known as primitive data structures.

Non-primitive Data Structures

These are more complex data structures. These data structures are derived from the primitive data structures. They stress on formation of sets of homogeneous and heterogeneous data elements.

Other classifications :-

Homogeneous and Heterogeneous Data Structures

Homogeneous data structure

Homogeneous data structures are those data structures that contain only similar type of data e.g. like a data structure containing only integral values or float values. The simplest example of such type of data structures is an Array.

Heterogeneous Data structure

Heterogeneous Data Structures are those data structures that contains a variety or dissimilar type of data, for e.g. a data structure that can contain various data of different data types like integer, float and character. The examples of such data structures include structures, class etc.

Static and Dynamic Data Structures

Static data structure

Static data structures are those data structures to which the memory is  assigned at the compile time, means the size of such data structures has to be predefined in the program and their memory can be increased or decreased during the execution. The simplest example of static data structure is an array.

Dynamic data structure

Dynamic Data Structures are those data structures to which memory  is assigned at the runtime or execution time and hence their size can be increased or decreased dynamically as per requirement of the program. The example of such data structures include class, linked lists etc.

Linear and Non-Linear Data Structures

Linear data structure

In linear data structures the data is kept in the sequential manner which means that the previous data has knowledge of next data or we can say that the data is interrelated. The example such data structures include arrays, linked lists etc.

Non-linear data-structure

Non-linear data structures does not maintain any kind of relationship among the data. The data is stored in non-sequential manner and examples of these types of data structures includes graphs, trees etc.


Basic Operations on Data Structure

Traversing : Accessing each record exactly once so that certain items in the record may be processed.(This accessing or processing is sometimes called 'visiting" the records.)

Searching : Finding the location of the record with a given key value, or finding the locations of all records, which satisfy one or more conditions.

Sorting : Arranging the data in some logical order.

Inserting : Adding new records to the structure.

Deleting : Removing a record from the structure.

Merging :
Combining the data of two different sorted file into a single sorted file.

Sometimes two or more data structure of operations may be used to create another operation.

Relation between Data Structures and Algorithms

An algorithm in data structure are step-by-step instructions that a computer can read. It is used for programming software and applications based on calculations.

Algorithms and Data Structures are tightly wound together. Algorithm depends on data structures, if you change either of them, complexity will change considerably. They are not same, but are definitely two sides of the same coin. Selecting a good Data Structure is itself a path towards better algorithm.

Basic Data Structures :- Arrays, Stack, Queue, Lists, Trees and Graph.

Arrays

Arrays are a very simple data structure, and may be thought of as a list of a fixed length. Arrays are nice because of their simplicity, and are well suited for situations where the number of data items is known (or can be programmatically determined). Suppose you need a piece of code to calculate the average of several numbers. An array is a perfect data structure to hold the individual values, since they have no specific order, and the required computations do not require any special handling other than to iterate through all of the values. The other big strength of arrays is that they can be accessed randomly, by index. For instance, if you have an array containing a list of names of students seated in a classroom, where each seat is numbered 1 through n, then studentName(i) is a trivial way to read or store the name of the student in seat i.

An array might also be thought of as a pre-bound pad of paper. It has a fixed number of pages, each page holds information, and is in a predefined location that never changes.

Linked Lists

A linked list is a data structure that can hold an arbitrary number of data items, and can easily change size to add or remove items. A linked list, at its simplest, is a pointer to a data node. Each data node is then composed of data (possibly a record with several data values), and a pointer to the next node. At the end of the list, the pointer is set to null.

By nature of its design, a linked list is great for storing data when the number of items is either unknown, or subject to change. However, it provides no way to access an arbitrary item from the list, short of starting at the beginning and traversing through every node until you reach the one you want.

Stack

Stacks are, in a sense, the opposite of queues, in that they are described as "last in, first out". The classic example is the pile of plates at the local buffet. The workers can continue to add clean plates to the stack indefinitely, but every time, a visitor will remove from the stack the top plate, which is the last one that was added.

While it may seem that stacks are rarely implemented explicitly, a solid understanding of how they work, and how they are used implicitly, is worthwhile education. Those who have been programming for a while are intimately familiar with the way the stack is used every time a subroutine is called from within a program. Any parameters, and usually any local variables, are allocated out of space on the stack. Then, after the subroutine has finished, the local variables are removed, and the return address is "popped" from the stack, so that program execution can continue where it left off before calling the subroutine.

An understanding of what this implies becomes more important as functions call other functions, which in turn call other functions. Each function call increases the "nesting level" (the depth of function calls, if you will) of the execution, and uses increasingly more space on the stack. Of paramount importance is the case of a recursive function. When a recursive function continually calls itself, stack space is quickly used as the depth of recursion increases.

Queue

A queue is a data structure that is best described as "first in, first out". A real world example of a queue is people waiting in line at the bank. As each person enters the bank, he or she is "enqueued" at the back of the line. When a teller becomes available, they are "dequeued" at the front of the line.

Perhaps the most common use of a queue is to implement a Breadth First Search (BFS). BFS means to first explore all states that can be reached in one step, then all states that can be reached in two steps, etc. A queue assists in implementing this solution because it stores a list of all state spaces that have been visited.

A common type of problem might be the shortest path through a maze. Starting with the point of origin, determine all possible locations that can be reached in a single step, and add them to the queue. Then, dequeue a position, and find all locations that can be reached in one more step, and enqueue those new positions. Continue this process until either a path is found, or the queue is empty (in which case there is no path). Whenever a "shortest path" or "least number of moves" is requested, there is a good chance that a BFS, using a queue, will lead to a successful solution.

Tree

Trees are a data structure consisting of one or more data nodes. The first node is called the "root", and each node has zero or more "child nodes". The maximum number of children of a single node, and the maximum depth of children are limited in some cases by the exact type of data represented by the tree.

One of the most common examples of a tree is an XML document. The top-level document element is the root node, and each tag found within that is a child. Each of those tags may have children, and so on. At each node, the type of tag, and any attributes, constitutes the data for that node. In such a tree, the hierarchy and order of the nodes is well defined, and an important part of the data itself. Another good example of a tree is a written outline. The entire outline itself is a root node containing each of the top-level bullet points, each of which may contain one or more sub-bullets, and so on. The file storage system on most disks is also a tree structure.

HashTable

Hash tables are a unique data structure, and are typically used to implement a "dictionary" interface, whereby a set of keys each has an associated value. The key is used as an index to locate the associated values. This is not unlike a classical dictionary, where someone can find a definition (value) of a given word (key).

Unfortunately, not every type of data is quite as easy to sort as a simple dictionary word, and this is where the "hash" comes into play. Hashing is the process of generating a key value (in this case, typically a 32 or 64 bit integer) from a piece of data. This hash value then becomes a basis for organizing and sorting the data. The hash value might be the first n bits of data, the last n bits of data, a modulus of the value, or in some cases, a more complicated function. Using the hash value, different "hash buckets" can be set up to store data. If the hash values are distributed evenly (which is the case for an ideal hash algorithm), then the buckets will tend to fill up evenly, and in many cases, most buckets will have no more than one or only a few objects in them. This makes the search even faster.

A hash bucket containing more than one value is known as a "collision". The exact nature of collision handling is implementation specific, and is crucial to the performance of the hash table. One of the simplest methods is to implement a structure like a linked list at the hash bucket level, so that elements with the same hash value can be chained together at the proper location. Other, more complicated schemes may involve utilizing adjacent, unused locations in the table, or re-hashing the hash value to obtain a new value. As always, there are good and bad performance considerations (regarding time, size, and complexity) with any approach.

Most standard libraries, such the Java API, and the .NET framework, provide a Stack, Queue, Lists, HashTables class that provides the various data structure operations. Details on how to use the API will be discussed later.

Applications of Data Structure

Every application uses Data structure but it is upto how you can use data structure to create more efficient programs and applications. To name some are :-
  • Compiler Design,
  • Network flow model
  • Operating System,
  • Database Management System,
  • Statistical analysis package,
  • Numerical Analysis,
  • Graphics,
  • Artificial Intelligence,
  • Simulation
Conclusion

To conclude I would say that data structures have a enormous application in the field of computing and it has touched us in every manner. I would recommend you to study data structure in great detail if you really want to be a good and efficient programmer. If you are interested  then I would ask you  to buy the books I have referred.

Recommended Reading

1. Art of Computer Programming, Volume 1: Fundamental Algorithms by Donald E. Knuth
2. Data Structures and Algorithm Analysis in C++, by Mark Allen Weiss
3. Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching by Robert Sedgewick
References Taken

1. Data Structures by Lipshutz and another one by Baluja.
2. Wikipedia.
3. http://www.topcoder.com/

Thank You for reading my tutorial and I hope that you enjoyed it.

Thank you,
Sincerely,
Psycho_Coder.

0 comments :

Post a Comment

Follow Me!

Blog Archive

Followers

Visitor Map