Search This Blog

Wednesday, 2 July 2014

Socket Programing



Socket:

Sockets allow communication between two different processes on the same or different machines.

Socket Types:

There are four types of sockets available to the users. The first two are most commenly used and last two are rarely used.
Processes are presumed to communicate only between sockets of the same type but there is no restriction that prevents communication between sockets of different types.

·       Stream Sockets: Delivery in a networked environment is guaranteed. If you send through the stream socket three items "A,B,C", they will arrive in the same order - "A,B,C". These sockets use TCP (Transmission Control Protocol) for data transmission. If delivery is impossible, the sender receives an error indicator. Data records do no have any boundaries.
·  Datagram Sockets: Delivery in a networked environment is not guaranteed. They're connectionless because you don't need to have an open connection as in Stream Sockets - you build a packet with the destination information and send it out. They use UDP (User Datagram Protocol).
·     Raw Sockets: provides users access to the underlying communication protocols which support socket abstractions. These sockets are normally datagram oriented, though their exact characteristics are dependent on the interface provided by the protocol. Raw sockets are not intended for the general user; they have been provided mainly for those interested in developing new communication protocols, or for gaining access to some of the more esoteric facilities of an existing protocol.
·      Sequenced Packet Sockets: They are similar to a stream socket, with the exception that record boundaries are preserved. This interface is provided only as part of the Network Systems (NS) socket abstraction, and is very important in most serious NS applications. Sequenced-packet sockets allow the user to manipulate the Sequence Packet Protocol (SPP) or Internet Datagram Protocol (IDP) headers on a packet or a group of packets either by writing a prototype header along with whatever data is to be sent, or by specifying a default header to be used with all outgoing data, and allows the user to receive the headers on incoming packets.

 

Network Addresses - IP Addreses

Before we proceed with actual stuff, lets understand a but about the Network Addresses - The IP Address.
The IP host address, or more commonly just IP address, is used to identify hosts connected to the Internet. IP stands for Internet Protocol and refers to the Internet Layer of the overall network architecture of the Internet.
An IP address is a 32-bit quantity interpreted as 4 8-bit numbers or octets. Each IP address uniquely identifies the participating user network, the host on the network, and the class of the user network.
An IP address is usually written in a dotted-decimal notation of the form N1.N2.N3.N4, where each Ni is a decimal number between 0 and 255 decimal (00 through FF hexadecimal).

 

Address Classes:

IP addresses are managed and created by the Internet Assigned Numbers Authority (IANA). There are 5 different address classes. You can determine which class any IP address is in by examining the first 4 bits of the IP address.
  • Class A addresses begin with 0xxx, or 1 to 126 decimal.
  • Class B addresses begin with 10xx, or 128 to 191 decimal.
  • Class C addresses begin with 110x, or 192 to 223 decimal.
  • Class D addresses begin with 1110, or 224 to 239 decimal.
  • Class E addresses begin with 1111, or 240 to 254 decimal.

Addresses beginning with 01111111, or 127 decimal, are reserved for loopback and for internal testing on a local machine; [ You can test this: you should always be able to ping 127.0.0.1, which points to yourself ] Class D addresses are reserved for multicasting; Class E addresses are reserved for future use. They should not be used for host addresses.

 

Example:

Class
Leftmost bits
Start address
Finish address
A
0xxx
0.0.0.0
127.255.255.255
B
10xx
128.0.0.0
191.255.255.255
C
110x
192.0.0.0
223.255.255.255
D
1110
224.0.0.0
239.255.255.255
E
1111
240.0.0.0
255.255.255.255
 

 

Subnetting:

Subnetting an IP Network can be done for a variety of reasons, including organization, use of different physical media (such as Ethernet, FDDI, WAN, etc.), preservation of address space, and security. The most common reason is to control network traffic.
The basic idea in subnetworking (a common word also used is subnetting) is to partition the host identifier portion of the IP address into two parts:
  1. A subnet address within the network address itself; and
  2. A host address on the subnet.
For example, a common Class B address format is N1.N2.S.H, where N1.N2 identifies the Class B network, the 8-bit S field identifies the subnet, and the 8-bit H field identifies the host on the subnet.
This is very difficult to remember many Host Names in terms of numbers. So these host names are generally known by "ordinary" names such as takshila or nalanda. We write software application's to find out the dotted IP address corresponding to a given name.
The process of finding out dotted IP address based on the given alphanumeric host name is known as hostname resolution.
A hostname resolution is done by special softwares residing on a high capacity systems. These system are called Domain Name Systems (DNS) which keeps mapping of IP addresses and corresponding ordinary names.

 

How to make client:

The system calls for establishing a connection are somewhat different for the client and the server, but both involve the basic construct of a socket. The two processes each establish their own sockets.
The steps involved in establishing a socket on the client side are as follows:
1.      Create a socket with the socket() system call.
2.      Connect the socket to the address of the server using the connect() system call.
3.      Send and receive data. There are a number of ways to do this, but the simplest is to use the read() and write() system calls.

 

How to make a server:

The steps involved in establishing a socket on the server side are as follows:
1.      Create a socket with the socket() system call.
2.      Bind the socket to an address using the bind() system call. For a server socket on the Internet, an address consists of a port number on the host machine.
3.      Listen for connections with the listen() system call.
4.      Accept a connection with the accept() system call. This call typically blocks until a client connects with the server.
5.      Send and receive data using the read() and write() system calls.

 

Client and Server Interaction:

Following is the diagram showing complete Client and Server interaction:




Socket Structures

There are various structures which are used in Unix Socket Programming to hold information about the address and port and other information. Most socket functions require a pointer to a socket address structure as an argument. Structures defined in this tutorial are related to Internet Protocol Family.
The first structure is struct sockaddr that holds socket information:
struct sockaddr{
        unsigned short  sa_family;    
        char            sa_data[14];
};
This is a generic socket address structure which will be passed in most of the socket function calls. Here is the description of the member fields:
Attribute
Values
Description
sa_family
AF_INET
AF_UNIX
AF_NS
AF_IMPLINK
This represents an address family. In most of the Internet based applications we use AF_INET.
sa_data
Protocol Specific Address
The content of the 14 bytes of protocol specific address are interpreted according to the type of address. For the Internet family we will use port number IP address which is represented by sockaddr_in structure defined below.
Second structure that helps you to reference to the socket's elements is as follows:
struct sockaddr_in {
        short int           sin_family;  
        unsigned short int   sin_port; 
        struct in_addr      sin_addr; 
        unsigned char       sin_zero[8];
};



Here is the description of the member fields:
Attribute
Values
Description
sa_family
AF_INET
AF_UNIX
AF_NS
AF_IMPLINK
This represents an address family. In most of the Internet based applications we use AF_INET.
sin_port
Service Port
A 16 bit port number in Network Byte Order.
sin_addr
IP Address
A 32 bit IP address in Network Byte Order.
sin_zero
Not Used
You just set this value to NULL as this is not being used.


The next structure is used only in the above structure as a structure field and holds 32 but netid/hostid.
struct in_addr {
        unsigned long s_addr;
};
Here is the description of the member fields:
Attribute
Values
Description
s_addr
service port
A 32 bit IP address in Network Byte Order.


There is one more important structure. This structure is used to keep information related to host.
struct hostent
{
  char  *h_name; 
  char  **h_aliases; 
  int   h_addrtype;  
  int   h_length;    
  char  **h_addr_list
#define h_addr  h_addr_list[0]
};
Here is the description of the member fields:
Attribute
Values
Description
h_name
ti.com etc
This is official name of the host. For example tutorialspoint.com, google.com etc.
h_aliases
TI
This will hold a list of host name aliases.
h_addrtype
AF_INET
This contains the address family and in case of Internet based application it will always be AF_INET
h_length
4
This will hold the length of IP address which is 4 for Internet Address.
h_addr_list
in_addr
For the Internet addresses the array of pointers h_addr_list[0], h_addr_list[1] and so on are points to structure in_addr.

NOTE: h_addr is defined as h_addr_list[0] to keep backward compatibility.
Following structure is used to keep information related to service and associated ports.
struct servent
{
  char  *s_name; 
  char  **s_aliases; 
  int   s_port;  
  char  *s_proto;
};
Here is the description of the member fields:
Attribute
Values
Description
s_name
http
This is official name of the service. For example SMTP, FTP POP3 etc.
s_aliases
ALIAS
This will hold list of service aliases. Most of the time this will be set to NULL.
s_port
80
This will have associated port number. For example for HTTP this will be 80.
s_proto
TCP
UDP
This will be set to the protocol used. Internet services are provided using either TCP or UDP.

 

Ports and Services


When a client process wants to connect a server, the client must have a way of identifying the server that it wants to connect. SO if the client knows the 32-bit Internet address of the host on which the server resides it can contact that host. But how does the client identify the particular server process running on that host ?
To resolve the problem of identifying a particular server process running on a host, both TCP and UDP have defined a group of well known ports.
For our purposes, a port will be defined as an integer number between 1024 and 65535. This is because all port numbers smaller than 1024 are considered well-known -- for example, telnet uses port 23, http uses 80, ftp uses 21, and so on.
The port assignments to network services can be found in the file /etc/services. If you are writing your own server then care must be taken to assign a port to your server. You should make sure that this port should already be not assigned to any other server.
Normally its a practice to assign any port number more than 5000. But there are many organizations who has written servers having port number more than 5000. For example Yahoo Messanger runs on 5050, SIP Server runs on 5060 etc.

Example Ports and Services:

Here is a small list of services and associated ports. You can find most updated list of internet ports and associated service at IANA - TCP/IP Port Assignments.
Service
Port Number
Service Description
echo
7
UDP/TCP sends back what it receives
discard
9
UDP/TCP throws away input
daytime
13
UDP/TCP returns ASCII time
chargen
19
UDP/TCP returns characters
ftp
21
TCP file transfer
telnet
23
TCP remote login
smtp
25
TCP email
daytime
37
UDP/TCP returns binary time
tftp
69
UDP trivial file transfer
finger
79
TCP info on users
http
80
TCP World Wide Web
login
513
TCP remote login
who
513
UDP different info on users
Xserver
6000
TCP X windows (N.B. >1023)

 

Port and Service Functions:


Unix provides following functions to fetch service name from the /etc/services file.
·       struct servent *getservbyname(char *name, char *proto): - This call takes service name and protocol name and returns corresponding port number for that service.
·         struct servent *getservbyport(int port, char *proto): - This call takes port number and protocol name and returns corresponding service name.
The return value for each function is a pointer to a structure with the following form:
struct servent
{
  char  *s_name; 
  char  **s_aliases; 
  int   s_port;  
  char  *s_proto;
};
Here is the description of the member fields:
Attribute
Values
Description
s_name
http
This is official name of the service. For example SMTP, FTP POP3 etc.
s_aliases
ALIAS
This will hold list of service aliases. Most of the time this will be set to NULL.
s_port
80
This will have associated port number. For example for HTTP this will be 80.
s_proto
TCP
UDP
This will be set to the protocol used. Internet services are provided using either TCP or UDP.


Network Byte Orders


Unfortunately, note all computers store the bytes that comprise a multibyte value in the same order. Consider a 16 bit interget that is made up of 2 bytes. There are two ways to store this value.
·         Little Endian: In this scheme low-order byte is stored on the starting address (A) and high-order byte is stored on the next address (A + 1).
·         Big Endian: In this scheme high-order byte is stored on the started address (A) and low-order byte is stored on the next address (A+1).
So that machines with different byte order conventions can communicate, the Internet protocols specify a canonical byte order convention for data transmitted over the network. This is known as Network Byte Order.
When establishing an Internet socket connection, you must make sure that the data in the sin_port and sin_addr members of the sockaddr_in structure are represented in Network Byte Order.

 

Byte Ordering Functions:


Routines for converting data between a host's internal representation and Network Byte Order are:
Function
Description
htons()
Host to Network Short
htonl()
Host to Network Long
ntohl()
Network to Host Long
ntohs()
Network to Host Short

Here is more detail of these functions:
·         unsigned short htons(unsigned short hostshort)
This function converts 16-bit (2-byte) quantities from host byte order to network byte order.
·         unsigned long htonl(unsigned long hostlong)
This function converts 32-bit (4-byte) quantities from host byte order to network byte order.
·         unsigned short ntohs(unsigned short netshort)
This function converts 16-bit (2-byte) quantities from network byte order to host byte order.
·         unsigned long ntohl(unsigned long netlong)
This function converts 32-bit quantities from network byte order to host byte order.
These functions are macros and result in the insertion of conversion source code into the calling program. On little-endian machines the code will change the values around to network byte order. On big-endian machines no code is inserted since none is needed; the functions are defined as null.

 

IP Address Functions


Unix provides various function calls that will help you manipulating IP addresses. These functions convert Internet addresses between ASCII strings (what humans prefer to use) and network byte ordered binary values (values that are stored in socket address structures).
There are following three function calls which are used for IPv4 addressing:
(1) int inet_aton(const char *strptr, struct in_addr *addrptr):
This function call converts the specified string, in the Internet standard dot notation, to a network address, and stores the address in the structure provided. The converted address will be in Network Byte Order (bytes ordered from left to right). This returns 1 if string was valid and 0 on error.
Following is the usage example:
#include <arpa/inet.h>
 
(...)
    int retval;
    struct in_addr addrptr
    
    memset(&addrptr, '\0', sizeof(addrptr));
    retval = inet_aton("68.178.157.132", &addrptr); 
 
(...)
(2) in_addr_t inet_addr(const char *strptr):
This function call converts the specified string, in the Internet standard dot notation, to an integer value suitable for use as an Internet address. The converted address will be in Network Byte Order (bytes ordered from left to right). This returns a 32-bit binary network byte ordered IPv4 address and INADDR_NONE on error.
Following is the usage example:
#include <arpa/inet.h>
 
(...)
 
    struct sockaddr_in dest;
 
    memset(&dest, '\0', sizeof(dest));
    dest.sin_addr.s_addr = inet_addr("68.178.157.132"); 
 
(...)
(3) char *inet_ntoa(struct in_addr inaddr):
This function call converts the specified Internet host address to a string in the Internet standard dot notation.
Following is the usage example:
#include <arpa/inet.h>
 
(...)
 
        char *ip;
 
        ip=inet_ntoa(dest.sin_addr);
 
        printf("IP Address is: %s\n",ip);
 
(...)