Nowadays, the Internet can be seen as an ever-changing platform where new and different types of services and applications are constantly emerging. In fact, many of the existing dominant applications, such as social networks, have appeared recently, being rapidly adopted by the user community. All these new applications required the implementation of novel communication protocols that present different network requirements, according to the service they deploy. All this diversity and novelty has lead to an increasing need of accurately profiling Internet users, by mapping their traffic to the originating application, in order to improve many network management tasks such as resources optimization, network performance, service personalization and security. However, accurately mapping traffic to its originating application is a difficult task due to the inherent complexity of existing network protocols and to several restrictions that prevent the analysis of the contents of the generated traffic. In fact, many technologies, such as traffic encryption, are widely deployed to assure and protect the confidentiality and integrity of communications over the Internet. On the other hand, many legal constraints also forbid the analysis of the clients' traffic in order to protect their confidentiality and privacy. Consequently, novel traffic discrimination methodologies are necessary for an accurate traffic classification and user profiling. This thesis proposes several identification methodologies for an accurate Internet traffic profiling while coping with the different mentioned restrictions and with the existing encryption techniques. By analyzing the several frequency components present in the captured traffic and inferring the presence of the different network and user related events, the proposed approaches are able to create a profile for each one of the analyzed Internet applications. The use of several probabilistic models will allow the accurate association of the analyzed traffic to the corresponding application. Several enhancements will also be proposed in order to allow the identification of hidden illicit patterns and the real-time classification of captured traffic. In addition, a new network management paradigm for wired and wireless networks will be proposed. The analysis of the layer 2 traffic metrics and the different frequency components that are present in the captured traffic allows an efficient user profiling in terms of the used web-application. Finally, some usage scenarios for these methodologies will be presented and discussed.
Publisher: Universidade de Aveiro