If you are interested in any of these projects, please contact Helen Ashman to discuss.
In this project you will be working to find out whether image searches have similar differences and whether images last longer in the top N, change more suddenly and how much overlap there is between the search engines in image results.
This project involves setting up a honeypot and then setting up profile capturing software to create intruder profiles.
An extension of this project is to try to characterise intruders if their collective behaviour is different to that of normal users of the network. For example, do they change directory and list contents more often than normal users? Do they use su or sudo commands more often than normal users? The aim is to see if we can characterise 'intruder behaviour'.
This project will be jointly supervised by Dr Raymond Choo.
In a chatroom version of the behavioural intrusion detection system, users of a chatroom can be reauthenticated according to their normal behaviour so if someone is using another person's credentials, they will be detected.
We have some chatroom data available to us. That data also includes non-human users of the system so another outcome is to be able to detect any non-human users from their profiles, a sort of Turing Test.
http://www.bbc.co.uk/news/technology-22529357This news article is about the autocomplete function of a search engine, which is where you start typing into the little search window and a number of possibilities pop up, based on what you are typing. It is based on what other people have typed in previous searches, and tends to list the most popular searches with those opening characters. In a related news article (linked underneath the main story), it seems that someone has deliberately created an entry in the autocomplete function by sending in a scurrilous query many times so that it appears in the autocomplete.
So it seems that Google's results are not only open to 'gaming' by the search engine optimisation companies, but also within the autocomplete function - just get enough people/processes typing in a given search and it will start to pop up. The project I want to propose is: how quickly can we 'plant' an autocomplete entry? How many times do we need to submit a query for an autocomplete entry to appear?
This would involve setting up some new queries within software that we already have working in the lab so that we can disguise the origin of the query and randomise its frequency, so repeated queries will look like many different people. Of course, we wouldn't want to create a defamatory query. Instead we could create a completely made-up situation, perhaps something positive but false such as associating solar activity with increased happiness - yes I know it sounds wacky but there are quite a lot of results on Google for 'solar activity with increased happiness', and they are not (yet) coming up in the autocomplete function (except in my own local autocomplete as a result of typing it in once).
In this project, you will help do another bigger experiment that will statistically confirm the mutual relevance assumption, and which will also generate information about what sort of relationship there is between different sorts of queries, and will help us look for patterns in different kinds of relationships between queries. Some queries are definitely about the same thing (e.g. Castle Pernstejn and Hrad Pernstejn), while others are a similar topic but not the same thing (e.g. iPad and MacBook).
In this project you will be using what is called 'cluster overlap by population' to stick together any clusters that have the same label but are not already stuck together but have lots of contained pages in common. Once we have these, we can then use the cluster overlap method to discover synonyms (where the search terms are in the same language) and translations (where they are in different languages).
In this project you will be working to collect some data and answer some of these questions. We will analyse the difference between personalised and non-personalised results, and discover whether people are more satisfied with personalised results.
If you're not paying for it, you're not the customer. You're the product being sold.This observation might also be made of search engines. They amass large quantities of personal data from every user of their services. It is unlikely that the business model of search engines is philanthropic, so clearly their income derives from other sources, such as advertising. Targeted advertising is increasingly popular with advertisers as it offers access to better sales prospects than blanket advertising, but to achieve this, personal information about users must be available. Companies such as Experian in the UK sell such data for marketing purposes, and with targeted advertising now appearing alongside normal search results, it is evident that search engines are using the personal data they collect to provide targeted search opportunities to their advertisers. In fact, it is even claimed that some search engine corporations are no longer in the search business but rather are now in the marketing business. Given this viewpoint, one might wonder if the personalisation of content is offered not to improve the user experience (as it may do quite the opposite) but to offer a publicly palatable reason for the collection of enormous quantities of personal data.
Under Australia's National Privacy Principles and the European Union's Data Privacy Directive, the user's informed consent must be sought prior to data collection. However, the user cannot rely on corporations to abide by local privacy laws, especially when data is managed offshore by international corporations. Also it seems that even where privacy laws explicitly forbid certain types of data collection, it still takes place (as evidenced by the illegal collection of household network data by Google StreetView from many countries during 2010 and more recently the apparently inadvertent bypassing of 'do not track' browsers instructions).
All this means that the user must be vigilant about outgoing communications from their personal devices. In some cases, personal data is deliberately released by users, such as on social networks. However it is where personal data is collected surreptitiously that this proposal focuses. At present the control over what data is collected lies largely with the search engines themselves.
This project proposes a proxy architecture that will reverse that situation, so that the release of personal data is governed at the user's end, not at the search engine end. The user will still be able to make use of personalised search, however that personalisation will be performed by the proxy which will run locally to the user, not by the search engine. Use of amazon's cloud services and/or the Tor anonymising service will ensure that location information is suppressed.