COMP 4027 Forensic and Analytical Computing


Network forensics

In this section we look at how we can examine network-based applications and usage to find evidence.

Intrusion detection systems

Intrusion detection systems, or IDSs, are applications that examine network traffic and user activities to detect, and sometimes prevent, hostile or damaging actions [4].

Intrusion Detection Systems can be divided into two major types according to how they behave when they detect intrusions:

Of most interest to us in the forensic context are the passive IDSs that log suspicious behaviour.

Mainstream IDSs

Aside from being passive or reactive, IDSs can be further categorised according to their method/s of detecting intrusions:

Both of these types of IDSs, the rule-based and the baited, can be used in either passive or reactive mode. However honeypots are by their nature best suited to evidence gathering since they allow an intruder what seems like free access to information that will not damage the system but shows clear evidence of their behaviour.

In our assignments 1 and 3, we are using honeypots to detect and monitor intrusions.

Experimental IDSs

Ongoing research is looking for novel ways to detect intrusions. One approach takes its inspiration from the human immune system and the other is based on a user modelling approach more commonly found in adaptive learning systems and recommender systems.

Biologically-inspired IDSs

The human immune system is an excellent example of an intrusion detection system, identifying and dispatching intruders, that in this context are pathogens.

A colleague of mine, Julie Greensmith at the University of Nottingham, is investigating danger theory. To paraphrase her words in [12], danger theory relies on human dendritic cells being able to to discriminate between the cell signal functions of apoptosis (controlled cell death such as with white blood cells sacrificing themselves in what is known as "cell suicide") and necrosis (uncontrolled cell death, caused by an external agent, such as a virus destroying a cell by rupturing its cell wall). Chemical signals from necrotic cells indicate abnormal damage (i.e. danger) and the immune system will eliminate any antigens (antibody-generating substances, often pathogens) in the location of the necrotic cells.

She further writes [12]:

Metaphorically, natural [dendritic cells] are the crime-scene investigators of the human immune system, traversing the tissue for evidence of damage - namely signals, and for potential suspects responsible for the damage, namely antigen. As with all things biological, it takes multiple DCs presenting multiple antigens to multiple effector T-cells for an actual response to be mounted.

She is trying to develop algorithms that mimic this process, having multiple damage/danger detectors which collectively trigger a response to a perceived threat only when a high enough number of detectors agree that a threat exists. There is one type of detector for each event type, responsive only to that threat type. This helps alleviate the problem with too many false positives which occurs in statistical methods.

The human immune system also has a form of "memory" of events, if not of exact dates, represented by antibodies present in the bloodstream. The higher the level of antibodies, the more recent the event. This process can also be mimicked by programming a life span into the damage/danger detectors so that they terminate if not needed. On the other hand, they should also be able to spawn new copies of themselves if there is a perceived threat, so that consensus on the threat can be reached. In this way, the number and type of damage/danger detectors can provide information about the threats encountered, and accurate timing and other information can be stored in the detectors themselves, for example their date of creation (which is triggered by a perceived threat by one of their own kind). The date/time of the most recent event can be deduced from the majority date of the detectors. If many threats occur, then there will be numerous groupings of dates within the detector population.

Behavioural IDSs

Behavioural IDSs are an interesting combination of intrusion detection and adaptive/recommender systems.

Adaptive hypermedia systems [13] and recommender systems [14] are applications that in some way tailor the content of a Web page presented to a user according to a user model that records relevant details of the user and their interaction with the system.

User models can either be persistent, when they are saved for later reuse or comparison, or sessional when they are not persistently saved (except maybe contributing to a statistical analysis). The sessional user models tend not to record any personally-identifying data, whereas the persistent ones do. The amazon bookseller website is the perfect example of both types of user model, used in a recommender system context.

In adaptive hypermedia systems and recommender systems, the user model is consulted by the system which alters the presentation or the content of the information according to what it believes the user requires. The variables of the user model are updated by the user's interactions with the system or by the user themselves explicitly inputting information about their preferences or other characteristics.

Intrusion detection based on user models is very much at an experimental stage, but generally can be characterised as being a sort of Statistical anomaly based IDS except that the stats are not on the network traffic but on the user behaviour. These approaches implement a canonical user model that represents the trusted user, and a sessional user model, representing the activity and behaviour of the user who is currently operating under the appropriate user id, which is compared to the trusted model. Should there be a significant discrepancy between the canonical user model and the sessional user model, an alert is generated. The canonical user model is of course persistent, while the sessional user model is sessional (although pertinent data is retained for evidence or statistical analysis).

In intrusion detection, user models can be applied in different ways.

Behavioural IDSs can be used to both prevent intrusion and gather evidence. Since a session user model (including user activities) is recorded in order to update the user model, the data gathered can contribute to evidence.

Web log analysis

Intrusion detection systems are applications that set out to capture information about criminal activity or misconduct. However there are many other sources of information that can yielf up evidence, if properly analysed. The most significant source of network information on human activities are Web logs.

Web logs contain details of the time, date and URL being accessed, along witwh the calling workstation (uniquely identified), file type and request type. The following excerpt shows some typical Web log records:

Probably one of the most remarkable mishaps in search engine history so far is the "accidental" release of around 3 months worth of search log data by AOL in 2006, colloquially known as "the AOL 500k". There were apparently 500,000 distinct users (hence "500k") although some estimates put the number at 650,000. The information recorded included a pseudonymised user id (consistent throughout) along with search strings, and times and dates.

There was serious outrage at the release of this data, with many people pointing out that the pseudonymising of userids was largely useless since they were consistent throughout the data, so that anyone could look at all the searches by a given person within the entire search log. It's well-known in data mining that while individual pieces of information do not give away much about the individual, the combination of that data (such as in federated databases) makes it easy to build up a detailed picture of the individual. This could easily be quite enough data for identity theft purposes [8] and at the very least could expose the identity of persons making searches.

While there are very real privacy concerns with the public release of data of this nature, it can be a valuable tool for anyone gathering evidence of misconduct or criminal activity. The AOL 500k data shows evidence of behaviour that goes from eccentric to completely criminal [10], such as a user who is repeatedly making queries that suggest intention to commit murder [9]. While the public can only guess at the identity of the pseudonymised searchers based on the search strings, law enforcement agencies can generally gain access to complete, unchanged Web logs. For example the Regulation for Investigatory Powers Act 2000 in the UK made ISPs responsible for keeping all logs for extended periods of time so that law enforcement agencies could later use it if necessary. However in the USA, Google resisted such a handover [11] citing user privacy.

Not everyone is as cavalier about personal privacy. The Security Lab [0] put in a request for access to Web logs and clickthroughs in 2008, including a pseudonymising process that changed the individual pseudonym every 24 hours, and screened out every Web access that included data for entry into scripts, but alas the request was rejected by the Ethics Committee.

However others are happy to release data for clickthrough analysis, including Microsoft who recently released to the Security Lab [0] the complete MSN Web search data plus clickthroughs for research purposes. This comprises around 15 million queries.


There are many network-based applications that could yield up information about network traffic and usage. Further examples include revision control systems and databases, which obviously keep detailed records of all activities of their users. Social networking sites such as Facebook also have a wealth of often quite personal information where unwary users record indiscreet information about themselves, including photographs, which can incriminate them. Even relatively inoffensive content in a Facebook entry can include significant information about the friends and colleagues of an individual (see for example this person). Blogs also contain much dtail that might incriminate someone.

Some resources

Useful reference materials:
  1. 0. Security Lab, UniSA
  2. 1. Haystack: an intrusion detection system
  3. 2. Artificial Immune Systems
  4. 3. Intrusion Detection
  5. 4. Intrusion Detection System
  6. 5. AOL Proudly Releases Massive Amounts of Private Data
  7. 6. Wendy's Web Search Blog with various interfaces to the AOL 500k data
  8. 7. AOL Search Records and User Privacy
  9. 8. AOL Releases Private Search Logs For Over 500,000 Searchers
  10. 9. AOL Search Data Shows Users Planning to Commit Murder
  11. 10. The really weird Something Awful website with a collection of user traces from the AOL 500k data. Beware as some of the searches show rather unpleasant subjects.
  12. 11. Feds take porn fight to Google
  13. 12. Research
  14. 13. Adaptive Hypermedia
  15. 14. Recommender system

Last update hla 2009-03-22