The Data Transparency Lab initiative already has six projects selected from among more than 60 presented for this first call
Since the launch of the first Data Transparency Lab workshop in November of last year, we have come a long way. Today we are very excited to reveal which are the first six projects that are going to be financed by the DTL with a total endowment of 300,000 euros (50,000 euros each).
All selected projects embody the core purposes and ambitions of the DTL: support research into tools, data and methodologies that help shed light on the use that online services make of personal data, to give people greater control over their data. These scholarships are intended to help finance, in whole or in part, the work of a research director (DI) and at least one doctoral or postdoctoral student for a period of approximately one year.
For those who don’t know yet, the DTL is a community initiative, created by Telefónica, Mozilla, MIT and the ODI, to show the flow and use of personal data on the network and explore ways to make data trading more transparent and respectful in the future. It is made up academics, public institutions, startups and large companies with the aim of creating an independent research community.
Following the call in April this year, we received and reviewed more than 60 applications. We created a commission of specialized researchers, chaired by Krishna Gummadi (Max Planck Institute for Software System) and Nikolaos Laoutaris (Telefónica) and made up of data and privacy experts from various academic institutions, such as the universities of Boston, Princeton and Cambridge, as well as as well as organizations and companies such as AT&T or INRIA. This commission of specialists was in charge of evaluating each proposal and determining its relevance based on the following research areas…
- Reverse engineering the use of personal data in online services (e.g., advertising, referral services, pricing or availability of products and information)
- Detection of personal data collections by online services
- Privacy Preservation of analytics personal data and management tools
- User awareness and society on the use of data
Some proposals made their way to the DTL council, which made the final decision on the six selected proposals announced today (details below). The DTL board is made up of the four DTL founders (Professor Sandy Pentland of MIT, Alina Hua of Mozilla, Pablo Rodríguez of Telefónica and Jenni Tennison of the Open Data Institute). The winning project teams will receive funding to begin building the proposed tools and platforms, and will report on progress at TDLConference 2015 in November.
In an age where the power and importance of data is indisputable but where there are understandably many fears surrounding data privacy and use, we have to strike a balance. A balance that recognizes the value of data but gives people the control over it, as well as the transparency they deserve. We believe that the DTL (a collaborative project between companies, organizations and academic institutions) is a step in the right direction, and we are confident that, with the funding of this first group of projects, data transparency can begin to become a reality and we can maintain trust in our digital society. The future of the network depends on it.
Details of the DTL 2015 projects
The six winning proposals that we are going to finance are five tools and a platform. Each one will receive 50,000 euros and support for a year.
Raising awareness about data driven privacy
Lorrie Faith Cranor (Carnegie Mellon University) and Blase Ur (Carnegie Mellon University)
“Create and test a data driven privacy tool that allows users to explore precisely which pages they have been tracked on by the different companies, as well as what these companies may have found out about their interests. In addition to the project of launching a fully functional and open source privacy tool, we will conduct a 75 participant, two week field test to compare visualizations of custom tracking data.”
Showing and controlling mobile privacy leaks
David Choffnes (Northeastern University), Christo Wilson (Northeastern University) and Alan Mislove (Northeastern University)
“Improving privacy in a context of widespread connectivity and abundance of sensorsrequires accredited third party systems that allow auditing and control of personal information (PII) leaks.
We’ll explore how to use machine learning to reliably identify PII from network flows, as well as algorithms that incorporate user feedback to adapt to ongoing changes in privacy breaches. Second, we’re going to build tools so that users can control how their information is (or isn’t) exchanged with others. These tools will be released in the form of free and open source applications that can work in various deployment scenarios, such as a device on the user’s home network or a cloud based virtual environment.
FDVT: a personal data assessment tool for Facebook users
Angel Cuevas (Carlos III University of Madrid) and Raquel Aparicio (Carlos III University of Madrid)
“The objective of this project is develop a tool that informs people (in real time) of the economic value of personal information associated with their browsing activity. Due to the complexity of this issue, the scope of the tool for this particular project will be rrestricted to Facebook to, among other things, inform its users of the value they are generating for Facebook. We will call it Facebook Data Valuation Tool (FDVT).”
The digital aura: raising awareness about browsing history
Arkadiusz Stopczynski (Technical University of Denmark), Mieszko Piotr Manijak (Technical University of Denmark, Piotr Sapiezynski (Technical University of Denmark) and Sune Lehmann (Technical University of Denmark)
“Our Internet browsing history is highly personal. Our search terms and the pages we visit reveal our fears, interests, afflictions, and secret ambitions.
A few years ago, the immersion project created at the MIT Media Lab had a great echo in the international press for visualizing the latent social information contained in the header data of our e mails. We want to do something similar for Internet browsing. Through topic models, we want design a simple panel where people can view the content of their navigation and observe how topics change over time. “We are basically going to combine this visualization with information from the data crawlers (how many crawlers, how much information is provided) so that users can see the implications that data crawling has for them.”
Striking the balance between privacy and functionality
Nick Feamster (Princeton University) and Sarthak Grover (Princeton University)
“In this project, we aim to develop mechanisms and tools to better understand these two questions:
- How much data does a user reveal in the course of their normal browsing activity?
- To what extent does a user’s data that a service stores serve to personalize said service?
“We will conduct controlled studies to study the extent to which the decisions a user makes to protect their privacy can harm the usability of an Internet service.”
Reverse engineering for Internet tracking: from niche market research to a simple tool
Arvind Narayanan (Princeton University) and Steven Englehardt (Princeton University)
“At Princeton we have created OpenWPM, a platform for transparency in Internet tracking. We have used it in various published studies to detect and reverse engineer Internet tracking. In the work we propose, we want democratize web privacy calculations: go from a scope for researching niche markets to a tool for everyone.
We will do this in two phases: We will use OpenWPM to publish a “web privacy census” (a monthly estimate of Internet privacy, including one million websites). This census will detect and calculate most of the types of privacy violations that researchers have reported so far: cookie blocking evasion, third party leaks of PII, canvas fingerprinting, and others. Second, we will create an analytics platform so that anyone without advanced knowledge can analyze census data. The platform will allow study data, scripts and results to be packaged and distributed in a format that is easy to replicate and scale.”