Data Collection by Digital Assistants: Discussion

From the Data Protection Commission, DPC, I welcome Mr. Dale Sunderland, deputy commissioner, Mr. Cathal Ryan, assistant commissioner, and Mr. Ultan O’Carroll, assistant commissioner. From University College Dublin, I welcome Dr. Benjamin R. Cowan, assistant professor at the school of information and communication studies.

I draw the attention of witnesses to the fact that by virtue of section 17(2)(l) of the Defamation Act 2009, witnesses are protected by absolute privilege in respect of their evidence to the committee. However, if they are directed by the committee to cease giving evidence on a particular matter and they continue to so do, they are entitled thereafter only to a qualified privilege in respect of their evidence. They are directed that only evidence connected with the subject matter of these proceedings is to be given and they are asked to respect the parliamentary practice to the effect that, where possible, they should not comment on, criticise or make charges against any person, persons or entity by name or in such a way as to make him, her or it identifiable.

I advise witnesses that any submissions or opening statements they have made to the committee will be published on the committee website after the meeting.

Members are reminded of the long-standing parliamentary practice to the effect that they should not comment on, criticise or make charges against a person outside the House or an official either by name or in such a way as to make him or her identifiable.

I remind members and witnesses to turn off their mobile phones or put them on silent mode.

I invite Mr. Sunderland to make his opening statement.

Mr. Dale Sunderland

I thank the committee for its invitation to attend to discuss the processing of personal data in the context of digital assistants. I am one of the deputy commissioners at the Data Protection Commission, with responsibility for the consultation, supervision and guidance on policy functions of the office. Also in attendance are Cathal Ryan, assistant commissioner, who has responsibility for supervision and engagement of technology multinationals, and Ultan O’Carroll, assistant commissioner, who has responsibility for technology policy.

As the committee will be aware, the Data Protection Commission, DPC, is the lead supervisory authority in the EU under the general data protection regulation, GDPR, for most of the world’s largest technology, social media and Internet platform companies operating in the European market, given that their EU headquarters are based in this State. This responsibility brings a central role for the DPC in overseeing the compliance of these companies’ services and products with EU data protection requirements. The technology products and services of a number of these companies include digital or voice assistants. Those are the terms commonly used to describe a consumer or in-home device that operates by listening for and interpreting human voice commands or instructions. The more common examples are Google’s Google Assistant, Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana. In recent months, a number of international media reports on human reviews of voice recordings collected by voice assistant products have brought into focus the question of how technology companies are using voice data gathered via voice assistant technology. As the EU lead supervisory authority for a number of these companies, namely, Google, Apple, and Microsoft, the DPC is currently engaging with those organisations to establish the manner in which they are meeting their data protection requirements in this context. The Luxembourg data protection authority acts as EU lead supervisory authority for Amazon Alexa.

Before turning to the data protection issues arising in the use of voice assistants, I will briefly describe how such devices function in practice, which is helpful for calling out the nexus with data protection requirements. Voice assistants record user audio clips and convert those clips into a text form that acts as an input to online services such as search, weather, shopping, mapping and communications. In some cases where the devices are home-based, the instructions may also be used to control smart home devices including those for lighting, TV and media, heating and security. Devices listen continuously for instructions and may in some cases also be set up to recognise individual users' voices. They listen for keywords such as “Hey Google” or “Siri”, which triggers recording of the user's voice. Voice recordings can be stored alongside their converted text forms, either on the device or in the cloud. Service providers may also record against a user’s profile, preferences and choices that they derive from an analysis of the user’s voice commands. They may use that to serve back the information sought by a user or to add to their profile for the purposes of advertising. Raw audio signals are converted into a recognisable human word. Often, because of the variation in human voice, accent, tone or phrase, machine learning - in other words, artificial intelligence - is used with large volumes of sample voices to create a model of human speech. Different models may be needed for different languages. These models are updated over time to refine them and improve quality. In some cases, quality control will require some human review of voice snippets, especially where words are being incorrectly recognised, where background noises are incorrectly identified as human speech or to help reduce misactivations of the device. Human review of voice data collected and processed by automated means is a common method to review, improve and train the algorithms used in voice assistant technology. While not inherently problematic or contentious from a data protection perspective, this kind of processing has many data protection elements, which must be carefully considered and assessed by the companies providing such services to ensure that the use of user data is legitimate and appropriately protected.

I will briefly mention some of the key elements arising in this context. First is ensuring an appropriate lawful basis to process personal data. Organisations need to identify a lawful basis under the general data protection regulation, GDPR, which will permit the processing of voice data in the manner proposed, such as consent or legitimate interest, which are the most commonly used legal bases. Valid consent issues arise where it is not demonstrably active, informed, specific, freely given and withdrawable. Likewise, a legitimate interests basis must clearly demonstrate that the legitimate interests of the company are not outweighed by the rights of users concerned.

The second element I would like to mention is the provision of adequate transparency to users where the type of processing taking place is concerned. Information must be in an understandable format which allows individuals to make informed choices as to how their data are processed and which facilitates the exercise of their data protection rights. With the potential for voice processing to be invisible, particularly further processing for purposes not readily obvious to users, transparency measures need to be in place when devices are being installed, when they are in use and where a user wants to review what processing their device has undertaken.

The third element is implementation of effective and integrated measures and safeguards to adhere to the principles of data protection. This element of compliance requires appropriate technical and organisational measures to be put in place to confirm that only personal data which are necessary for each specific purpose of the processing are actually processed. As I mentioned earlier, the human review of voice recordings is a common practice to improve the accuracy and effectiveness of algorithms designed to transcribe and translate voice data. However user data must be adequately protected in this process, and indeed for all purposes for which voice recordings are used. Such safeguards and protections can include designing the process of evaluation of audio snippets, by either contractors or employees of the company, with data protection in mind from the start; being clear on what volume and size of audio snippets are necessary for each processing purpose; identifying clear conditions where it is necessary to recognise the person whose voice is processed; clear and plain transparency and privacy notices; technical security safeguards such as pseudonymisation and anonymisation of data; organisational measures; and opt-in features. There is a long list of safeguards which any company should take into account when processing such data.

The fourth and final element of compliance I wish to mention is the implementation of measures appropriate to the nature of the data being processed and the risks to users. This is a very important element of compliance because sometimes digital assistants are activated incorrectly, with the risk that private or sensitive conversations in the home or workplace are inadvertently recorded. While providers of voice assistant services have implemented preventative measures, organisations need to do more to reduce their incidence. Implementing adequate safeguards can balance out or minimise the data protection risks arising in this way.

The DPC is continuing to examine these issues in our ongoing engagement with the companies for whom we are the EU lead supervisory authority. We acknowledge and welcome the recent changes made by several companies to enhance transparency for users concerning the practice of human review of voice data to improve voice assistant technology, as well as the implementation of greater user choice on the use of data in such contexts.

As lead supervisory authority, the DPC also continues to co-operate with our EU data protection colleagues to identify common areas of concern and to identify what further steps, including guidance, may be necessary to bring additional clarity to the application of data protection requirements in the use of voice assistant technology. I thank the committee for the opportunity to be here today. My colleagues and I will be happy to take questions.

Dr. Benjamin Cowan

I thank the Chairman, members of the committee, parliamentarians and all who have been involved in convening this committee. I feel honoured to be have been invited to give evidence.

I would like to give the committee an overview of the research I lead in digital assistants at UCD and at the ADAPT centre. I lead research into the user experience and user behaviour with voice-based devices and technologies. This includes looking at the barriers users see when interacting with these devices, as well as the opportunities to improve our experience with these devices. Our recent focus in both UCD and the ADAPT centre is on the growing importance of user trust with these devices as well as other artificial intelligence technologies. Colleagues such as Dr. Marguerite Barry also work specifically on the role of data ethics. I would like to specifically thank Dr. Barry and her PhD student, Mr. Gianluigi Riva, who have contributed significantly to the evidence I am about to give.

I wish to put this into context. It is hard to overestimate the prevalence and increasing reach of voice-based technologies. Smart speakers have been a catalyst for huge growth in the popularity of voice as a way of interacting with our technology. It is estimated that the number of smart speakers installed will grow to 207.9 million by the end of this year, with a lot of this growth taking place in China and the US. Data estimates suggest that this will grow to 500 million units by the end of 2023. For Ireland, just under 10% of households now own one of these devices and that figure is growing. This is a major technology that is being placed in people's homes and a major way in which people interact with devices. We also have these agents in our smartphones, so there is one in our pockets almost all the time. The reach of this technology is huge. Voice assistants are being used in cars, in healthcare contexts and in the home.

Especially in the case of smart speakers, these devices are becoming social devices by default, placed in public spaces in the home with multiple people interacting with each device. That includes friends, relatives, parents and children. The devices gather data on people who live there, as well as people who visit. They are the gateway to the Internet of things, whereby we use commands to control devices in our homes such as lights, alarms, doors and other devices. This is the context within which these technologies are being used.

I will not outline how these devices work, as we have covered that already. The data generally are used to improve the way the system operates. The more data the organisations that develop these systems have, the better the system can be at matching what a user says and how it behaves. The artificial intelligence techniques that are used, which were mentioned previously, benefit from large amounts of data. That is why these data are incredibly valuable. There is no two ways about it; these devices record our speech. Voice recordings include the information a user sends to the device but they can also include other bits of information. Paralinguistic cues such as intonation or prosody can be used to estimate age, sex and even native language. These recordings can potentially be used to build a version of a user's voice for particular commands and to impersonate the user. With the use of third-party applications, these data are also likely to be transmitted, shared or stored using other infrastructure. We must consider that infrastructure's potential security flaws. As we have mentioned previously, these devices can record users unintentionally, accidentally picking up and storing this audio. It is clear that these issues need to be addressed, and I am glad that this committee has been called to discuss them.

What do we need to think about when it comes to digital assistants? First, we need to be clear about it means to say that these devices are technically "always on". That means there is a microphone in every home and in every pocket. It might be waiting for a particular word or utterance but it is listening. At best, these can record accidentally. At worst, this information could be intercepted and used to monitor users. That may not be happening now, but it is a possibility. This seems unnecessarily intrusive to me. It may not be to others, but users have to be made aware of the fact that the microphones on these devices are always on.

We must also be clear on why data are being stored, who accesses them and with what they are being combined. Currently the reasons data are kept are opaque to the user. The purpose is summarised as making improvements to the system. Data gatherers must be more explicit about how these data are used in terms of tracking, profiling and sharing across an organisation, as well as what is being shared with third-party organisations. This needs to be outlined explicitly to the user, along with how the data are paired with other streams to influence the experience.

Moreover, users currently have no control over what companies can access or use these data. This means there is no opportunity for users to have an active ongoing voice in how the data are used. It also bakes in the competitive advantage of big data players, who can use this ocean of speech-based data to improve their systems while competition is left with little data to play with. This makes it hard for smaller start-ups, of which there is a thriving community in Dublin, which may change the way these systems operate to compete. For example, start-ups could change the way the systems work from a privacy perspective if they had the chance to build them more effectively. Giving users control of their data could also allow them to choose where their data recordings reside. It could lead to a boom in research in the area, as users could potentially donate data to non-profit organisations or research if they so desire.

We must also consider consent mechanisms for users. Currently these systems are used in public spaces by multiple users. A smart speaker in a kitchen or a living room captures audio recordings of several users, including neighbours, visiting relatives and children. None of these have consented to their data being recorded and stored but all are potentially being recorded. We need to discuss what that means and consider new mechanisms by which consent can be given for these data to be used.

We must also discuss the principle of privacy by design. We must consider how we can include privacy as a standard feature of the design of these devices. Some smart speakers include the option to turn off the microphone so it is not always on. "Push to interact" mechanisms would also reduce the likelihood of the accidental recording of data. We must also be aware of what these decisions mean for the user.

This type of design decision has a trade-off for users as far as convenience is concerned, whereby they might want to use the voice to initiate the agent in hands-busy, eyes-busy situations in which these technologies are really useful. We need to have a conversation about how we can bake in privacy by design.

Our work shows that privacy is a concern for users. Although it might not seem that it is influencing their behaviours yet, it is in the companies' and governments' interests to address this head-on. The data we are talking about are not a set of clicks, a search history or a set of cookies but rather our voices. Users perceive the latter data as far more personal. A hack or misuse of these data would be significant, and such a threat is potentially real. As users, therefore, we should have our eyes open as to what it means when we invite digital assistants into our homes.

I will start with a question for the Data Protection Commission and perhaps Dr. Cowan. This issue of people's data came up when we held the International Grand Committee on Disinformation and 'Fake News' here in the Houses of the Oireachtas. As Dr. Cowan said, children and visitors in a home could be profiled - are being profiled - and tracked and those data shared. What are the witnesses' views on citizens or users having power over their own data? I refer to the issue of consent and knowing how one has been profiled by these companies, whether Google, Facebook or any other company, and having access to that information such that there is transparency as to whom it is being shared with. As Dr. Cowan said, this concerns whoever comes into the home. They could be children or vulnerable people. We are all vulnerable if we do not know what is happening with our data. It is a matter of that access and transparency as to what data these companies have on us and having access to those data. Does that need legislation? What is the view at European level? Does Mr. Sunderland wish to start with that?

Mr. Dale Sunderland

Yes, I will take that. It is a very pertinent point. In our view, the GDPR provides the tools and levers, and will increasingly do so, to enable us to ensure there is proper, safe and trustworthy use of data. That is taking place now at a time when there is significant debate across not just industry but broader society, and among people who use these data and examine how they are used, about what constitutes ethical use of data in these contexts. One point to make is that this is not the sole way in which large platforms that have such devices collect data. I think this is the point the Chairman was bringing out, that is, that a profile is built from multiple data sources. The GDPR requires that there be transparency, and we have seen significant improvements in the detail of information provided, but the question is still open as to whether that information is being provided to users in a meaningful, easily accessible way that does not lead to user fatigue. These are key matters that the Office of the Data Protection Commissioner is examining and they form the context of some of these big inquiries.

We have seen since the introduction of the GDPR platforms beginning to introduce more granular controls for users as to how they can access the data that are being collected and delete them. There are positives in that respect but there is still a way to go. For the commission's part, we will start to drive the standards we think the GDPR requires through our regulatory work, whether through investigations or our ongoing weekly and daily engagement with these companies. However, if we are seeking to build a digitally based society and economy, user trust is essential, and transparency for users, user control and just the proper safeguarding and protected use of data are absolutely essential for companies. I think the big tech firms are getting that message and we see some change happening, but there is still a question to be answered. To be clear, there is probably more that needs to be done to ensure that the right levels of transparency are provided to users, that they have effective control and that everything they do online is not continuously tracked and added to a profile of them. As I said, some companies are now starting to introduce ways for individuals to delete their profiles, so we are seeing some positives coming into reality.

Mr. Sunderland is saying the legislation is there. Am I correct?

Mr. Dale Sunderland

The GDPR-----

He is saying the GDPR covers this and that I could ask any company about the profile it has on me and ask that company to give me all that information so I might know, be it true or false, whatever I am googling or however I am interacting online, that the company is creating a profile and I have access to it.

Mr. Dale Sunderland

There is a very important right of access under Article 15 of the GDPR. People have the right to access information that any organisation holds about them. There are some exceptions to this right - there are always some grey areas - but a number of the larger platforms are now starting to provide better user interfaces to allow users to examine these matters, to make their choice in the first place as to what data they want collected in respect of their activity on these platforms, and to make decisions as to whether they want advertising targeted at them.

It is not very clear, though, is it?

Mr. Dale Sunderland

Yes. That is the point in that-----

There is an obligation on such platforms to make this clear and user-friendly, as Mr. Sunderland said, rather than us all just clicking "OK" several times to get to see the website.

Mr. Dale Sunderland

One of our concerns at the moment is that, given that more information is being provided to users and there are now more user settings, it becomes increasingly difficult for users to manage their settings and find the information they need. This is a body of work that the tech platforms will need to continue to work out. We will drive it through our regulatory work. It is a matter of providing much more effective user interfaces for individuals in order that they know what data are being collected and how they can manage their controls and settings, and that this is done in a simple and straightforward way and, as I said, does not just lead to user fatigue or users being turned off the idea that they have increasingly been given more rights and a greater ability to control their data.

Finally, I mean this in the positive, but it sounds like we must wait for the regulation for the commission to put pressure on these companies to do this. This is the delay. They could do this much more quickly if they so wished, but we must enforce them and request them to do this under the GDPR powers.

Mr. Dale Sunderland

It is a mixture of both. The GDPR has led to a changed environment in that companies are now much more sensitive to and aware of the outcome or the potential implications of getting it wrong and have driven to make a number of changes to their platforms. Some things some of the big tech platforms have recently introduced allow for more granular choice as to what they collect when it comes to users' online activity and when they want that saved, deleted straight away or deleted after a month, for example. The platforms themselves are trying to look at what more they can do. I think it will be a mixture of both. Companies will realise their responsibilities, drive to higher standards and realise and really acknowledge and accept, which I think many are increasingly doing, that they have an obligation to the users to protect their data and use them in safe ways. The commission will fulfil its role through our regulatory work.

Dr. Benjamin Cowan

I echo the point that this is to do with regulation but also design. There is a sense that the interaction needs to be designed around the privacy-as-standard view. The interesting thing I find, especially from our research whenever we look at voice technology, is that these technologies are seen as conversational and as using conversation or speech as a kind of metaphor for interaction. That could be used as a way of informing people about what these systems can and cannot do and what they have when it comes to privacy or consent. However, the tech companies need to do this through design. That does not seem to be coming through the design of the systems at the moment. It is therefore not just a matter of legislation. It is about not only researching what the design issues are for the users but also how we can make interaction convenient for them with privacy baked in, and what they can do with the way in which these systems operate in the first place. There are some really easy wins. A conversational system could be used to ask whether it is appropriate that the platform is gathering certain data in certain contexts. That is a design choice, however, and a design decision to make, so there needs to be collaboration between design teams and people involved in data ethics as well as legislators whenever doing this. That is the only way in which a sustainable, user-centred solution will come.

I thank both the witnesses for their presentations. I think I was among a number of people who raised this issue back when Apple made an announcement - it was probably reported in The Guardian or one of those publications - that a number of people had been let go because of the issue of human review of sensitive private conversations. We were probably all a little concerned that more information than was necessary was being captured. I think part of Apple's defence at the time was, as Mr. Sutherland identified, that human review is necessary to improve the algorithms and assist in machine learning, and to an extent we get all that. However, unless there is an appropriate regulatory regime for the big tech companies, then, as night follows day, liberties will be taken at some point in the future.

The most important aspect of it is that individuals have a clear knowledge from the start that the interceptor, as it were, is always on and that it is not minded to always understand the call, whether it is Alexa, Siri or whatever. It is a bit like information on cigarette packets about the harmful nature of smoking; it requires authorities to flag the potential pitfalls of an activation that was not intentional. Many people would not have known prior to this story breaking that there was the potential for such a level of inadvertent activation. People do not know about it. I know many people who would not be anxious to have it on all the time if they knew the potential for other conversations to be recorded. There is an issue around protecting citizens from the perspective of civil liberties, which perhaps falls to the regulator. It needs greater support from the State through laws that would require a much greater volume of information upfront and a very clear opt in, because I agree with the Chair that one is linked to so many different platforms that one just clicks through. Of course things are buried and if someone goes to the bother of checking what information is held on him or hers, he or she agrees to it. Even when it comes to just going through the standard approval to see what we have given different permissions for, many people just leave it and move on. That is the beginning of a slippery slope to giving much greater control to the big tech companies. We have to be clear that people must opt in in a very clear and concise way. It is not about opting in to give them something that they want at a particular moment. They must opt in very clearly and be shown the consequences of doing so. Those consequences must be writ large, not just in small print. I do not know if that comes from the regulator. I doubt if the regulator having those powers is contained in the GDPR but perhaps it is. Perhaps the rules of engagement need to have some other type of legislative basis, rather than data just being captured, so that people are clear from the start. It is important that we get that right.

When there is an inadvertent capture of information, sensitive or otherwise, a log is prepared. It is used at a later stage in the human review to prepare better machine learning and artificial intelligence. In any of those inadvertent captures, are logs provided to the regulator? For every inadvertent activation, is a log prepared and forwarded to Mr. Sunderland as regulator?

Mr. Dale Sunderland

Our role as regulator is to ensure that the entire process is compliant. At this stage, we are more concerned about the systemic issue and nature of the processing from the start of the installation of the product through to the use of the data at the other end. There may be multiple uses for that data. We do not receive, nor do we ask for logs. We are trying to stand back and look. In the case of misactivations, we want to know what the company has done to minimise the incidence of misactivations. They should be baked into privacy design, and, equally, dealing with misactivations, the way that is dealt with in privacy design and how it is implemented into processes. Companies have introduced features such as on device screening to understand if a person is actually engaging with the device before it starts to record. There are also further screening processes on the server side.

Misactivations still happen. As I noted in my opening remarks, this is a concern for us and we think more needs to be done. The Deputy is quite right that we are talking about private conversations in the home and in the workplace and many users were not, and still are not, aware that such misactivations can take place. An interesting aspect that arose in the human review of voice recording was that there was no awareness whatever that humans would take a voice recording and examine it to see if the algorithm was working correctly. The companies we have engaged with as lead supervisory authority have all enhanced their transparency requirements to say that this processing takes place.

On whether something more needs to be done, GDPR holds the tools because the processing of data, from the point it is collected, must be legitimate, fair, lawful and transparent. It is about the application of those important data protection principles at each step of the processing of personal data. From the moment someone says something, there should be safeguards and protections built in to ensure voice recordings are only taken when someone intends to engage with the device. As Dr. Cowan said, data protection by design or default are fundamental principles of data protection law. My colleague, Mr. Ultan O'Carroll, has recently completed work as a co-rapporteur on the European Data Protection Board's guidelines on data protection by design and default. They are now out for consultation. It is something we want to see more of from the companies concerned. Before one ever starts to collect data - before the new ways of processing personal data to provide services is even designed - data protection concerns should be built in. Along the chain, one should look at what safeguards can be built in. It may be minimisation of clips or anonymisation of voice recordings. There is a legitimate purpose for this review but it is about what the company needs to do to ensure the risk is minimised.

I am trying to get to the point where the purchaser or user is informed upfront that there is a potential for this hazard so that before someone begins the process and there is the potential for any processing of data, he or she is clear that there are potential pitfalls. We all like to think that technology is now so advanced that there is no potential for mistakes to be made, but, of course, there is. It is on a development curve. The companies rarely provide those potential hazards upfront and they do not follow that path for a good reason, because far fewer people would avail of the technology and the service if they thought there was potential for these types of issue to arise. It is less about the processing of the data in the first instance than telling them about the potential pitfalls. I doubt that is contained in the GDPR; it is more about a clear description of what the technology is and is not. Anyone who purchases a pharmaceutical product today will find a list of reactions that one might suffer. Such a warning is contained in the GDPR.

Mr. Dale Sunderland

A significant number of those issues are covered by the transparency requirements under the GDPR. More could be done on user notices; that is a very good point. As well as the transparency elements, we have tried to focus on ensuring the risk to this is minimised to the greatest extent it can be so that it is not just a case of continuing where there is a significant risk of misactivations and it is a case of users beware. This is so that users are protected from misactivations in the first place and that the data reviewed by humans are properly anonymised or pseudonymised and that they would take away personal identifiers so that when a human goes to review it that the privacy or data protection risk has been minimised to the greatest extent possible. That is what we mean by data protection by design and that is how we can do it. More could be done with user interfaces and the setup of these technologies to inform users. There is a long way to go but we have seen some positive moves in recent months by the service providers and manufacturers of the devices.

Dr. Cowan wished to come in on this.

Dr. Benjamin Cowan

What is being discussed is how consent is gathered, and where it is informed consent that people are given all the risks before they purchase the devices, make the decision and then live by that. However, there may also be an issue of ongoing consent so that when something is inadvertently gathered, the system could alert the customer of that.

That is a decision the design teams could make at these companies on the basis that that is what they want in their interaction. One could have agents that diversify from the standards in the market by doing that. I do not want to speculate about why they are potentially not doing that, but there is a sense that we need to think a bit more like behavioural scientists when it comes to that issue. We are trying to get consent, not just at the start, so that people know what the systems do and how they are going to gather data. That is important, but also that it goes through the interaction, because these are in people's houses for not just a day or a week but potentially for years and years. The way that the data are being used may change. We need to have that element of dynamic consent as we go through, but that is a design choice as well as a legislation choice. Whether there needs to be legislation that says these companies must make users aware throughout the interaction, however which way they do that, that has to happen. That could be a way of doing it but then the companies then decide how that is done. It is not good to do it through a written description that someone has to look through, because people want to get that system set up so they just want to click on through.

Just to clarify what Dr. Cowan is saying, do we need to legislate for data protection by design? He hopes the companies are making ethical design decisions but how do we make them do that? Is legislation required? Who would like to respond to that?

Dr. Benjamin Cowan

My colleague, Mr. O'Carroll, would like to comment.

Mr. Ultan O'Carroll

Part of what we were doing in Europe was an opinion on data protection by design. The key takeaway from that is that Article 25 of the GDPR imposes a legal obligation on data controllers, the organisations that create these devices and do the processing, to account for data protection by design in everything they do. It is not just about things like data minimisation or security. It is to do with transparency and the legal basis for how companies gather consent, the way they process data, the way they design their processing chain from start to finish and how they dispose of information. It has to be effective. It has to be measurable in some respects to ensure that it is effective and that it can be demonstrated to be effective. That is a key part of what they have to do as an obligation.

Article 25 is fairly simple. It is not extensive within the GDPR but it applies horizontally across all activities that data controllers have to do and to account for. Now, as this public consultation period and that opinion comes to a start, and as we get to the point where we can revise it based on public consultation, there are opportunities to emphasise those issues in much clearer ways. GDPR allows us to do that in a number of cases with other accountability tools, such as codes of conduct and certification. One might for instance see a code or certification like with signage for CCTV, which might say:

This is a voice processing device. It does these things. It may use human review in certain cases. Your choices are explained here. You can withdraw consent here and here.

Those are the kinds of possibilities that are available for transparency measures but it is very much to do with that difference in modality between what we are used to dealing with keyboards and computers to in-home devices where our voices are being processed, where the processing is sometimes invisible or it could be said to be that because it is going on in the background. It is ambient. That changes the way we interact with these things and it changes our expectations as well.

Is it the case that it is too complex to regulate or control? The companies are obviously not doing this if we have had these glitches. From our point of view, I am just wondering how we deal with it. Mr. O'Carroll said Article 25 of the GDPR provides the legal framework. Is the issue enforcement? These companies are not adhering to the GDPR standards as specified in the legislation.

Mr. Ultan O'Carroll

That is to be determined. The work we are doing at the moment is to head towards that conclusion, because the systems are complicated. There are many different moving parts and different things to consider. Different kinds of data are being processed. In some cases, it might be biometric and in others, it might not be. When organisations do a quality control process, it could be based on a different kind of legal framework than the gathering of the data in the first place. There are lots of different things to do. We are going to work towards that and determine what extra work or gaps there are that need to addressed.

We had some good discussions at the recent international grand committee and at the think tank we had in the Westin Hotel. I recall at that event that experts from the Carnegie UK Trust explained the duty of care process, which was all about process design and the ability of states or regulators to set out in clear terms the design processes. It was not the case that we would just rely on Article 25 but that we would be specific. I do not mean down to the individual content moderation but in the design process. My understanding is that informed its paper on harmful content but it could inform this area here as well. Is the European system looking at that duty of care process as a Europe-wide standard rather than just relying on the broad articles of the GDPR to protect us? Would that be a good idea?

Mr. Dale Sunderland

Mr. O'Carroll referred to the GDPR codes of conduct and certification. The GDPR says that they are mainly industry-led but data protection authorities have a role to encourage the development of codes and certification. A code of conduct is not what we might traditionally know as a code in the sense that everyone signs up to behave themselves, but in a GDPR context, it is about taking a principle of transparency, for example, and saying at a more granular level, for voice assistant technology, what is the industry going to sign up to say how it delivers transparency for this type of technology. The GDPR then requires that there be an independent monitoring body and that the code of conduct be signed off by the relevant data protection authority or if it is a pan-European, cross-border code covering a number of jurisdictions, signed off by the European Data Protection Board. We are starting to gear up for the DPC and other data protection authorities to encourage the development of codes. An example is that the industry could come together to develop a code for voice assistant technology misactivations. It would go through a regulatory scrutiny process and be signed off in the context of voice assistance, because it would be a cross-border processing code, ultimately by the European Data Protection Board. We have been very proactive in that space in providing guidance. Mr. Cathal Ryan was the lead rapporteur on the codes of conduct paper that the European Data Protection Board published last year. That is an area we will progress in 2020. We have said in another context that we would focus on children and children's rights in the first instance. We will work with the tech companies and try bring them to the table to agree a code of conduct. There is lots more scope for that concept.

At the international grand committee, the Data Protection Commissioner, Ms Helen Dixon, said that this process of starting the GDPR has been in existence for 18 months. Has the European regulator seen cases coming through using Article 25 and asking questions about that consent process and what consent is? Will that legal process drive this as well or is it just coming from the data protection commissioners themselves? Are there cases before the DPC at present?

Mr. Dale Sunderland

Yes, we have 21 open inquiries into the large tech companies at the moment. A number of those investigations cover the principles of data protection, issues around the standard of consent and what transparency means. As the commissioner mentioned at the hearing to which the Deputy referred, we have significant investigations into Adtech. Through those investigations, we will look in detail at how companies say they are complying with the law and aligning what they identify as their compliance features with what we say is the correct interpretation of the GDPR. They will culminate in formal decisions of the DPC having gone through the measures of co-operation with all the other data protection authorities.

The DPC has an extensive workload, and a significant responsibility because, as Mr. Tuohy said earlier, 100,000 people are working in this sector and we want to have a good reputation. Does the DPC have all the necessary resources or does it need additional staffing resources from the State to manage the workload?

Mr. Dale Sunderland

The commissioner spoke on this recently as well. On a positive note, we have been on a significant upwards trajectory in recent years. We have reached almost 140 staff. The funding we will receive from the Exchequer for next year will allow us to increase to a figure in the region of 170 staff. At the time we expressed our disappointment that we did not get everything we asked for, but we will make do with what we have. The story of resourcing the DPC must continue in the future. We are in the upper tier of best resourced data protection authorities but we are not at the very top, yet we carry the disproportionate burden of being the first line of defence for regulating the technology companies in Europe.

That story has not ended. The resourcing of the DPC must continue to be a focus in the years ahead.

A clear message from all this work, particularly that of the international grand committee, is that the DPC has a critical role. It was reckless and wrong of the Department of Public Expenditure and Reform to turn down the application for additional resources and that decision should be reversed.

On voice listening devices, Dr. Cowan stated that we must take into consideration that if we own a smartphone, we have one of these agents in our pockets at all times. He is indicating that if members have their phones on the table during a private session of the committee, the microphones and recording devices of the phones may pick up the conversation if someone has access to them.

Dr. Benjamin Cowan

That is potentially the case. The literature on computer security indicates that there are sensors on the devices and if one can access the sensors, one can use them maliciously. It is a possibility.

An increasing number of tech companies and governments insist that attendees do not bring their phones into private meetings. Does Dr. Cowan recommend that members of an Oireachtas committee should leave their phones outside the room while the committee is in private session?

Dr. Benjamin Cowan

That is not for me to decide. It may be an issue of importance. To some extent, the matter needs to be put into context. It depends on the security of the devices as well as the security these agents may have. Tech companies are best placed to discuss the security arrangements for devices. That may be worth considering.

I refer to public concern about this issue. Mr. Sunderland stated: "Voice assistants record user audio clips and convert those clips into a text form that acts as an input to online services such as search, weather, shopping" and so on. Many people have contacted members regarding verbal content recorded on their home voice device being converted into data that influences their search or advertising. That is the reality people are experiencing.

Mr. Dale Sunderland

That may be correct. Every platform has different ways of using data. There is no ubiquity in that regard. Generally, the transcript of what one says is used to provide what one asked for. For example, if one asks Siri what the weather is in Dublin, that voice recording is transcribed into text, which is put through the company's systems to provide the information sought. The company may then deduce that the user is interested in weather or, if the user asks about a holiday, that he or she is interested in a certain type of holiday. An inference could be drawn that the user is interested in a particular matter. If one's settings allow targeted advertising, the information recorded would be used to inform the selection of advertisements one may see on one's smart device. One could be using a home assistant and receive targeted advertisements on another device because it is joined up in the background of the platforms. We want to see more user choice regarding and control over those settings. Some platforms now allow users to turn off the collection or recording of web and app activity such that the information is not used to build a profile. Some companies are introducing deletion controls to allow users to delete audio data. Previously drawn inferences are cleared and the slate is wiped clean.

The challenge for us in terms of our educational work as a regulator is how to educate users better on the choices that exist as well as driving higher levels of user choice and transparency. We need to ensure users engage with the settings in a time and space that suits them. Making decisions on all of the settings while signing up to something does not always work because the user wishes to get to the end service.

I have discussed this matter with my teenage children, who have given up and trade access for everything. Dr. Cowan may be examining the issue of whether there is an underlying flaw in the basic business model whereby we have given up our personal data in return for free access and, in turn, the companies have this incredible surveillance capability and advertising power. Is Dr. Cowan beginning to examine this area? He referred to many small start-ups that are looking for opportunities. Is there a big problem in terms of a monopoly on access to the data?

Dr. Cowan wished to come in on another issue. I ask him to address it now as I am going to try to wrap up shortly.

Dr. Benjamin Cowan

I will address both issues. There is definitely a monopoly in respect of the data gathered by several major tech companies. We should try to ensure there is a level playing field in that regard for start-up companies and other organisations. There may be an issue of inequality because data are such an important part of these companies' business models. The data are used to build better tech as well as gain insight on users. It is critical to realise that it is not just about using the data to build user profiles. Those user profiles or information are used to make better technology, particularly in the case of voice assistants. We should reflect on what we are doing in that situation. I kind of agree with the Deputy on that issue.

On resources, much research is being carried out on trust in the context of artificial intelligence systems. That is being pursued through the ADAPT centre funded by Science Foundation Ireland as well as through some of the work we do in UCD. If we wish to signal to the tech companies how to do this better in terms of informed consent, dynamic consent or how to better design systems with that in mind, we need better funding of research.

Money is required for UCD as well.

Dr. Benjamin Cowan

We need to consider which aspects need to be focused on and considered. We do some great research in this area in Ireland. If the research was better funded, we would be able to do more.

I do not disagree.

I thank the witnesses for appearing. I dealt with representatives of the DPC at the Joint Committee on Justice and Equality when we were processing the Data Protection Act 2018. I am alarmed by the soft wording in the statement provided by the DPC. There is reference to ongoing engagement with the companies. That is too soft and weak. The DPC has strong legislative back-up under GDPR and there should be enforcement consequences. Engagement is not good enough when damage is being done by companies that are based here. People's private conversations and lives are being invaded and the GDPR is being breached. It is not good enough to state that the DPC has yet to reach a conclusion. It has a remit to inform the public about the breach of the GDPR by these companies and the fact that information on people's private conversations and lives is being collected. We know from evidence presented in The Guardian and by whistleblowers that that has happened, and is happening. However, the DPC is merely engaging with the companies. It should be concentrated on enforcement.

Have the companies become too big to regulate? Is the DPC too small to regulate them? There is a trend across several areas whereby damage is continuously being done. A leak of sensitive information could impact on people's private lives. I do not criticise Mr. Sunderland. There are funding issues. I acknowledge the DPC is attempting to increase its ability to deal with the concentration of data that rests in this country.

The committee spent months discussing the digital age of consent.

That is not being adhered to here. There is no lawful basis for what they are doing. It is not transparent because nobody knows it is happening apart from people who happen to read The Guardian and other newspapers. People with Apple watches or who subscribe to Amazon do not know that it is happening. What is the commission doing to drive awareness of this? Is there evidence of the sharing or sale of this data? How many companies are processing data of this nature? Is there a mechanism through which the practice can be actively monitored? I would like to hear more from the commission about enforcement. Can the officials provide detail about what that might mean for companies? Is the enforcement consequence sufficient to remove the clear breach? These companies have decided to breach the regulatory framework and GDPR and have knowingly breached them.

The Deputy's points are well made.

It is contempt for the legislative position within an Irish and European context. It is worrying that we are just allowing this to continue.

Mr. Dale Sunderland

We are engaging with other regulatory authorities across Europe on this matter on a regular basis. We discuss it with each other. There is no consensus about what compliance is with regard to some of these issues so it cannot be taken de facto that there has been a breach because every element of compliance must be looked at. We have achieved some change outside of statutory investigations over the past few months. If it transpires that we need to look at this in a different context, the commission will do so but the companies with which we have been dealing have now enhanced transparency. They are bringing their users through new engagement flows to bring customers' attention to this existence of this sort of processing. They are introducing new technology around misactivation. This is at one end, which is the area for which I am responsible in terms of the supervision element of the commission's work. We drive change, including behavioural change, outside investigations.

The commission has opened 21 statutory investigations into the big technology companies. The first decisions in respect of those investigations will start to conclude in early 2020. That is the spectrum of regulatory activity we must acknowledge. It involves investigations where there will be findings and where appropriate and necessary, sanctions, including administrative fines. At the other end, we interact effectively and robustly with the companies to drive them to make changes outside of investigations. If there is a need for the commission to go into a formal investigation mode, it will consider that fully and take that action.

If the commission is driving changes, it must ask what was wrong. If it has to improve transparency, clearly there was an issue with what it was doing prior to this. Mr. Sutherland did not mention any investigations in his opening statement.

The witnesses did so earlier with regard to the 21 cases.

Mr. Dale Sunderland

I doubt whether any commercial entity or private, public or voluntary sector company is always in full compliance with data protection law. I know there is context, risk and large-scale data processing. We are looking through all these issues and are achieving change as we engage with the companies. If based on everything we have gathered, we make an assessment that there is a need to look at this on a statutory footing through an inquiry under the Data Protection Act, the commission will make that decision. We are doing this in co-operation with our European data protection colleagues. The European Data Protection Board is looking at whether there is a need for further guidance in this field. It is the view of all of the colleagues with which we engage at a European level that there are issues that need to ironed out but there might be a need for further guidance from the European data protection community on this issue.

This is not a personal criticism. It is in a European context as well. Is big business in Europe able to manage the public affairs element of this with the economic side of the commission, for example? Are data issues and privacy concerns subservient to the profits of these corporations and their capacity to lobby? There will be serious consequences because we have not collectively enforced the GDPR. People will face the consequences. We have had other cases of breaches of data where lives have been lost. If someone's personal life is revealed because of a data breach, which is probable based on the scale of this, there will be a significant reaction and enforcement so we must be ahead of the curve.

Mr. Dale Sunderland

The DPC and data protection authorities in all other member states are entirely independent from their national governments and the European Commission. It is our job to take the legal framework we have been given and dispassionately regulate against that framework without fear or favour and this is what we are doing. We are coming to the conclusion of our major investigations so the results of those will start to appear in early 2020. That pipeline is filling up. As I mentioned earlier, some of the key principles of data protection that have been in place here for 30 years feature in those investigations. We make determinations on them and impose sanctions and other corrective measures where appropriate and necessary. We have the means to regulate this sector and are doing so in co-operation with other European data protection authorities.

I am conscious of time because we have two more sessions.

I thank the contributors to the debate, which has been very informative. Does Mr. Sutherland think there is an information deficit among the public regarding this issue? Does he think the normal mother, father or other adult is aware of the capabilities of whatever device is in front of them? Does he believe there is a deficit of this information? Do the public realise that the phones in front of us have the capability to monitor what we are saying, harvest that information and allow us to become the new currency? Regarding this deficit of information, how can the commission step in to inform the public about what these machines do so that they have a genuine understanding? As a public representative, until I sat in front of the witnesses today, I did not realise half of what was happening. What needs to be done so that people can be informed about what a mobile phone can do?

Mr. Dale Sunderland

There is a two-pronged approach. First, the companies have an obligation to drive awareness of their products and services. Deputy Dooley raised this issue. More could be done at that point. It is in the context of a broad number of services and what is nearly information overload in some respects, for example, when someone hits a webpage and is asked to consent to cookies, etc. There is an element of user fatigue in all of that. One of the answers is better user interfaces, which we touched on earlier, and better ways to bring information, including surface information, to individuals. Let us not forget that these companies have some of the smartest and most capable and talented designers in the world working for them so it is not just about designing a good user experience. There must also be better ways to bring this type of information to the surface for users. That is something of which we are very mindful.

We also have an obligation through our guidance functions. The paradox is that guidance we have published where, for example, we spoke about how people might be nudged towards choices online, etc., has not been picked up so there is a challenge there for us. We are trying to consider how we can best tackle the important issue of user awareness. It is in a context where we increasingly function in an online connected world. There are significant risks if companies do not implement their data processing practices correctly and if users are unable to inform themselves or are unaware of the risks but also the choices they have to control their data.

Does Dr. Cowan wish to come in?

Dr. Benjamin Cowan

The users with which we do research are quite aware of this. It is a significant issue about which they are concerned. Whether it acts a barrier to them using or stopping to use the technology is an open question. There is a sense of there being an awareness of being monitored based on the clicks they make and the voice aspects of technology in respect of interactions so there is awareness on the part of some users but it is not universal.

There definitely does need to be more information about that and it needs to be much more upfront. Removing my voice assistant researcher hat and putting on my behaviour change researcher hat, it is not just about information, it is about when the information is delivered. The action is to be made when someone is to interact or someone is to decide that they are going to let this system gather a piece of data they may not want it to gather. We need to figure that out and identify what mechanism we need to have a person make an informed decision based on that, at the point the decision is being made. That is a design problem as well as a problem of understanding what the interaction is. It is about how we design that more effectively, thinking about nudge and behaviour change-based technologies. That is what many of the tech companies are using to gather this data and to get us to purchase new things and more stuff. They are using nudge-based techniques so potentially we should be using nudge-based techniques for privacy as well.

We were talking about digital literacy. Would it come under that?

Dr. Benjamin Cowan

There is a bit of digital literacy as well. People need to know that information. The literature is very clear in terms of behaviour change. Information alone is not the answer to changing behaviour.

The other issue is the size of the operation. We are in an unusual space in Ireland. We have a major conglomeration of tech companies that are a major force. The Data Protection Commission is going up to 170 staff this year alone. We are talking about multinationals that have very deep pockets. Is it big industry against small government? The commission has an independent remit but it is tied in budgetary terms as regards staffing. Even though it is independent, when it comes to budget issues it is tied in to the Government. What is the scale of the deficit at the moment and where do we need to go regarding the scale of funding that is required? Are we looking at some kind of tax on the multinationals so they can be part of the commission's programme and so it can fund itself? Where do the witnesses see the funding stream coming from? Is it strictly coming from central Government? Is there a European element to it? Should there be an industry element as well?

Mr. Sunderland also wanted to come in on a previous point. He can do that now if he wishes.

Mr. Dale Sunderland

In terms of funding, we sought €21 million for next year and I think we received just short of €16 million. I want to emphasise that the figure was up from approximately €1.5 million six years ago. It has been a steep trajectory, but necessary given the context in which we operate. We receive our funding from the central Exchequer. We are trained to become a separate Vote for next year. As I said earlier, I think our funding needs to increase to allow us to be more effective. We are committed to getting this job of work done and are working very diligently at it. The first results of those inquiries and the draft decisions will come early next year. However, it is not just about big tech. We have received 11,000 complaints since the GDPR came into effect of which 9,000 came this year. That is up from 2,500 the year before the GDPR. That is a huge increase. They are the ordinary, everyday complaints about access requests from someone seeking their information from a doctor or hospital, for example. All of those complaints must be addressed as well. It is not solely about the big tech companies.

I think it is likely that our funding will continue to be provided from the central Exchequer. That is quite normal for data protection authorities across the European Union. We will continue to make the case as to why we need that funding. On staffing, compared to some of our peer data protection authorities and the larger authorities in the European Union, for example the French data protection authority has over 200 staff. The Italian authority has around 200 staff and the Dutch authority is growing towards 200. The UK Information Commissioner's Office has about 700 staff so they are somewhat of an outlier. When we are the lead authority for these tech companies and their single interlocutor, it means they get to deal directly with us and we take on the responsibility of engaging with all of the other data protection authorities. There is significant overhead in that simply on the administrative side, let alone engaging in regulatory matters. There are tools in the GDPR that we want to maximise such as joint operations. We are actively trying to promote the concept among our peer authorities that we would bring members from other authorities on board in helping us run our investigations. We are hopeful that there will be some progress on that in the near future. I hope I have answered all the Senator's questions.

Mr. Sunderland answered very comprehensively. I thank all the witnesses for an invaluable session and for being so frank and giving such comprehensive answers. We will now suspend to allow the next group of witnesses to take their seats.

Sitting suspended at 4.15 p.m. and resumed at 4.20 p.m.