Noah Apthorpe
Assistant Professor
Computer Science
Colgate University

McGregory Hall
napthorpe @
      colgate . edu

Google Scholar
GitHub
arXiv

Publications

* indicates undergraduate advisee co-author

Conferences & Journals

Privacy Governance Not Included: Analysis of Third Parties in Learning Management Systems
Madelyn Rose Sanfilippo, Noah Apthorpe, Karoline Brehm, Yan Shvartzshnaider
Information and Learning Sciences. 2023
This paper aims to address research gaps around third party data flows in education by investigating governance practices in higher education with respect to learning management system (LMS) ecosystems. The authors answer the following research questions: how are LMS and plugins/learning tools interoperability (LTI) governed at higher education institutions? Who is responsible for data governance activities around LMS? What is the current state of governance over LMS? What is the current state of governance over LMS plugins, LTI, etc.? What governance issues are unresolved in this domain? How are issues of privacy and governance regarding LMS and plugins/LTIs documented or communicated to the public and/or community members?
Automating Internet of Things Network Traffic Collection with Robotic Arm Interactions
Xi Jiang*, Noah Apthorpe
Journal of Communications. 2023
Consumer Internet of things research often involves collecting network traffic sent or received by IoT devices. These data are typically collected via crowdsourcing or while researchers manually interact with IoT devices in a laboratory setting. However, manual interactions and crowdsourcing are often tedious, expensive, inaccurate, or do not provide comprehensive coverage of possible IoT device behaviors. We present a new method for generating IoT network traffic using a robotic arm to automate user interactions with devices. This eliminates manual button pressing and enables permutation-based interaction sequences that rigorously explore the range of possible device behaviors. We test this approach with an Arduino-controlled robotic arm, a smart speaker, and a smart thermostat, using machine learning to demonstrate that collected network traffic contains information about device interactions that could be useful for network, security, or privacy analyses. We also provide source code and documentation allowing researchers to easily automate IoT device interactions and network traffic collection in future studies.
You, Me, and IoT: How Internet-Connected Consumer Devices Affect Interpersonal Relationships
Noah Apthorpe, Pardis Emami-Naeini, Arunesh Mathur, Marshini Chetty, Nick Feamster
ACM Transactions on Internet of Things (TIOT). 2022
Internet-connected consumer devices have rapidly increased in popularity; however, relatively little is known about how these technologies are affecting interpersonal relationships in multi-occupant households. In this study, we conduct 13 semi-structured interviews and survey 508 individuals from a variety of backgrounds to discover and categorize how consumer IoT devices are affecting interpersonal relationships in the United States. We highlight several themes, providing exploratory data about the pervasiveness of interpersonal costs and benefits of consumer IoT devices. These results inform follow-up studies and design priorities for future IoT technologies to amplify positive and reduce negative interpersonal effects.
SkillBot: Identifying Risky Content for Children in Alexa Skills
Tu Le, Danny Yuxing Huang, Noah Apthorpe, Yuan Tian
ACM Transactions on Internet Technology (TOIT). 2022
Many households include children who use voice personal assistants (VPA) such as Amazon Alexa. Children benefit from the rich functionalities of VPAs and third-party apps but are also exposed to new risks in the VPA ecosystem. In this paper, we first investigate “risky” child-directed voice apps that contain inappropriate content or ask for personal information through voice interactions. We build SkillBot — a natural language processing (NLP)-based system to automatically interact with VPA apps and analyze the resulting conversations. We find 28 risky child-directed apps and maintain a growing dataset of 31,966 non-overlapping app behaviors collected from 3,434 Alexa apps. Our findings suggest that although child-directed VPA apps are subject to stricter policy requirements and more intensive vetting, children remain vulnerable to inappropriate content and privacy violations. We then conduct a user study showing that parents are concerned about the identified risky apps. Many parents do not believe that these apps are available and designed for families/kids, although these apps are actually published in Amazon’s “Kids” product category. We also find that parents often neglect basic precautions such as enabling parental controls on Alexa devices. Finally, we identify a novel risk in the VPA ecosystem: confounding utterances, or voice commands shared by multiple apps that may cause a user to interact with a different app than intended. We identify 4,487 confounding utterances, including 581 shared by child-directed and non-child-directed apps. We find that 27% of these confounding utterances prioritize invoking a non-child-directed app over a child-directed app. This indicates that children are at real risk of accidentally invoking non-child-directed apps due to confounding utterances.
GKC-CI: A Unifying Framework for Contextual Norms and Information Governance
Yan Shvartzshnaider, Madelyn Sanfilippo, Noah Apthorpe
Journal of the Association for Information Science and Technology (JASIST). 2022
Privacy-enhancing technologies that incorporate a socially meaningful conception of privacy, one that meets people's expectations and is ethically defensible, need to factor in contextual privacy norms and information governance as part of their design. This involves understanding what information handling practices users deem acceptable, what factors influence users' perceptions and behaviors, and how informational norms evolve. In this paper, we present GKC-CI, a unifying framework for examining contextual privacy norms and information governance in a given context to help structure research inquiries around these questions.
Practical Assignments for Teaching Contextual Integrity
Noah Apthorpe
3rd Annual Symposium on Applications of Contextual Integrity. 2021
This discussion prompt outlines several active learning techniques for teaching Contextual Integrity (CI), including case studies, privacy norm surveys, policy evaluations, formal logic, and system audits. It also highlights challenges to CI pedagogy in technical computer science courses and poses discussion questions to foster collaborative development of practical assignments for teaching CI.
IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at Scale
Danny Yuxing Huang, Noah Apthorpe, Frank Li, Gunes Acar, Nick Feamster
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (Ubicomp/IMWUT). 2020
The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Since April 2019, 4,322 users have installed IoT Inspector, allowing us to collect labeled network traffic from 44,956 smart home devices across 13 categories and 53 vendors. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors use outdated TLS versions and advertise weak ciphers. Second, we discover about 350 distinct third-party advertiser and tracking domains on smart TVs. We also highlight other research areas, such as network management and healthcare, that can take advantage of IoT Inspector's dataset. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.
Going Against the (Appropriate) Flow: A Contextual Integrity Approach to Privacy Policy Analysis
Yan Shvartzshnaider, Noah Apthorpe, Nick Feamster, Helen Nissenbaum
The Seventh AAAI Conference on Human Computation and Crowdsourcing (HCOMP). 2019
We present a method for analyzing privacy policies using the framework of contextual integrity (CI). This method allows for the systematized detection of issues with privacy policy statements that hinder readers’ ability to understand and evaluate company data collection practices. These issues include missing contextual details, vague language, and overwhelming possible interpretations of described information transfers. We demonstrate this method in two different settings. First, we compare versions of Facebook’s privacy policy from before and after the Cambridge Analytica scandal. Our analysis indicates that the updated policy still contains fundamental ambiguities that limit readers’ comprehension of Facebook’s data collection practices. Second, we successfully crowdsourced CI annotations of 48 excerpts of privacy policies from 17 companies with 141 crowdworkers. This indicates that regular users are able to reliably identify contextual information in privacy policy statements and that crowdsourcing can help scale our CI analysis method to a larger number of privacy policy statements.
Evaluating the Contextual Integrity of Privacy Regulation: Parents' IoT Toy Privacy Norms Versus COPPA
Noah Apthorpe, Sarah Varghese*, Nick Feamster
Proceedings of the 28th USENIX Security Symposium (USENIX). 2019
Increased concern about data privacy has prompted new and updated data protection regulations worldwide. However, there has been no rigorous way to test whether the practices mandated by these regulations actually align with the privacy norms of affected populations. Here, we demonstrate that surveys based on the theory of contextual integrity provide a quantifiable and scalable method for measuring the conformity of specific regulatory provisions to privacy norms. We apply this method to the U.S. Children's Online Privacy Protection Act (COPPA), surveying 195 parents and providing the first data that COPPA's mandates generally align with parents' privacy expectations for Internet-connected "smart" children's toys. Nevertheless, variations in the acceptability of data collection across specific smart toys, information types, parent ages, and other conditions emphasize the importance of detailed contextual factors to privacy norms, which may not be adequately captured by COPPA.
Keeping the Smart Home Private with Smart(er) IoT Traffic Shaping
Noah Apthorpe, Danny Yuxing Huang, Dillon Reisman, Arvind Narayanan, Nick Feamster
Proceedings on Privacy Enhancing Technologies Symposium (PETS). 2019
The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.
User Perceptions of Smart Home IoT Privacy
Serena Zheng*, Noah Apthorpe, Marshini Chetty, Nick Feamster
Proceedings of the 2018 ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW). 2018
Smart home Internet of Things (IoT) devices are rapidly increasing in popularity, with more households including Internet-connected devices that continuously monitor user activities. In this study, we conduct eleven semi-structured interviews with smart home owners, investigating their reasons for purchasing IoT devices, perceptions of smart home privacy risks, and actions taken to protect their privacy from those external to the home who create, manage, track, or regulate IoT devices and/or their data. We note several recurring themes. First, users' desires for convenience and connectedness dictate their privacy-related behaviors for dealing with external entities, such as device manufacturers, Internet Service Providers, governments, and advertisers. Second, user opinions about external entities collecting smart home data depend on perceived benefit from these entities. Third, users trust IoT device manufacturers to protect their privacy but do not verify that these protections are in place. Fourth, users are unaware of privacy risks from inference algorithms operating on data from non-audio/visual devices. These findings motivate several recommendations for device designers, researchers, and industry standards to better match device privacy features to the expectations and preferences of smart home owners.
Security and Privacy Analyses of Internet of Things Children's Toys
Gordon Chu*, Noah Apthorpe, Nick Feamster
IEEE Internet of Things Journal (IoT-J). 2018
This paper investigates the security and privacy of Internet-connected children’s smart toys through case studies of three commercially-available products. We conduct network and application vulnerability analyses of each toy using static and dynamic analysis techniques, including application binary decompilation and network monitoring. We discover several publicly undisclosed vulnerabilities that violate the Children’s Online Privacy Protection Rule (COPPA) as well as the toys’ individual privacy policies. These vulnerabilities, especially security flaws in network communications with first-party servers, are indicative of a disconnect between many IoT toy developers and security and privacy best practices despite increased attention to Internet-connected toy hacking risks.
Discovering IoT Smart Home Privacy Norms using Contextual Integrity
Noah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur, Dillon Reisman, Nick Feamster
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (Ubicomp/IMWUT). 2018
The proliferation of Internet of Things (IoT) devices for consumer "smart" homes raises concerns about user privacy. We present a survey method based on the Contextual Integrity (CI) privacy framework that can quickly and efficiently discover privacy norms at scale. We apply the method to discover privacy norms in the smart home context, surveying 1,731 American adults on Amazon Mechanical Turk. For $2,800 and in less than six hours, we measured the acceptability of 3,840 information flows representing a combinatorial space of smart home devices sending consumer information to first and third-party recipients under various conditions. Our results provide actionable recommendations for IoT device manufacturers, including design best practices and instructions for adopting our method for further research.
Automatic Neuron Detection in Calcium Imaging Data using Convolutional Networks
Noah Apthorpe, Alexander Riordan, Rob Aguilar*, Jan Homann, Yi Gu, David Tank, H. Sebastian Seung
Advances in Neural Information Processing Systems (NIPS). 2016
Calcium imaging is an important technique for monitoring the activity of thousands of neurons simultaneously. As calcium imaging datasets grow in size, automated detection of individual neurons is becoming important. Here we apply a supervised learning approach to this problem and show that convolutional networks can achieve near-human accuracy and superhuman speed. Accuracy is superior to the popular PCA/ICA method based on precision and recall relative to ground truth annotation by a human expert. These results suggest that convolutional networks are an efficient and flexible tool for the analysis of large-scale calcium imaging data.
Treating Software Defined Networks like Disk Arrays
Zhiyuan Teo, Ken Birman, Noah Apthorpe, Robbert Van Renesse, Vasily Kuksenkov
IEEE NetSoft Conference and Workshops (NetSoft). 2016
Data networks require a high degree of performance and reliability as mission-critical IoT deployments increasingly depend on them. Although performance and fault tolerance can be individually addressed at all levels of the networking stack, few solutions tackle these challenges in an elegant and scalable manner. We propose a redundant array of independent network links (RAIL), adapted from RAID, that combines software-defined networking, disjoint network paths and selective packet processing to improve communications bandwidth and latency while simultaneously providing fault tolerance. Our work shows that the implementation of such a system is feasible without necessitating awareness or changes in the operating systems or hardware of IoT and client devices.

Workshops

A Developer-Friendly Library for Smart Home IoT Privacy-Preserving Traffic Obfuscation
Trisha Datta*, Noah Apthorpe, Nick Feamster
Proceedings of the 2018 Workshop on IoT Security and Privacy (IoT S&P). 2018
The number and variety of Internet-connected devices have grown enormously in the past few years, presenting new challenges to security and privacy. Research has shown that network adversaries can use traffic rate metadata from consumer IoT devices to infer sensitive user activities. Shaping traffic flows to fit distributions independent of user activities can protect privacy, but this approach has seen little adoption due to required developer effort and overhead bandwidth costs. Here, we present a Python library for IoT developers to easily integrate privacy-preserving traffic shaping into their products. The library replaces standard networking functions with versions that automatically obfuscate device traffic patterns through a combination of payload padding, fragmentation, and randomized cover traffic. Our library successfully preserves user privacy and requires approximately 4 KB/s overhead bandwidth for IoT devices with low send rates or high latency tolerances. This overhead is reasonable given normal Internet speeds in American homes and is an improvement on the bandwidth requirements of existing solutions.
Machine Learning DDoS Detection for Consumer Internet of Things Devices
Rohan Doshi*, Noah Apthorpe, Nick Feamster
IEEE Deep Learning and Security Workshop (DLS). 2018
An increasing number of Internet of Things (IoT) devices are connecting to the Internet, yet many of these devices are fundamentally insecure, exposing the Internet to a variety of attacks. Botnets such as Mirai have used insecure consumer IoT devices to conduct distributed denial of service (DDoS) attacks on critical Internet infrastructure. This motivates the development of new techniques to automatically detect consumer IoT attack traffic. In this paper, we demonstrate that using IoT-specific network behaviors (e.g. limited number of endpoints and regular time intervals between packets) to inform feature selection can result in high accuracy DDoS detection in IoT network traffic with a variety of machine learning algorithms, including neural networks. These results indicate that home gateway routers or other network middleboxes could automatically detect local IoT device sources of DDoS attacks using low-cost machine learning algorithms and traffic data that is flow-based and protocol-agnostic.
Cleartext Data Transmissions in Consumer IoT Medical Devices
Daniel Wood*, Noah Apthorpe, Nick Feamster
Workshop on Internet of Things Security and Privacy (IoT S&P). 2017
This paper introduces a method to capture network traffic from medical IoT devices and automatically detect cleartext information that may reveal sensitive medical conditions and behaviors. The research follows a three-step approach involving traffic collection, cleartext detection, and metadata analysis. We analyze four popular consumer medical IoT devices, including one smart medical device that leaks sensitive health information in cleartext. We also present a traffic capture and analysis system that seamlessly integrates with a home network and offers a user-friendly interface for consumers to monitor and visualize data transmissions of IoT devices in their homes.
Closing the Blinds: Four Strategies for Protecting Smart Home Privacy from Network Observers
Noah Apthorpe, Dillon Reisman, Nick Feamster
Workshop on Technology and Consumer Protection (ConPro). 2017
The growing market for smart home IoT devices promises new conveniences for consumers while presenting novel challenges for preserving privacy within the home. Specifically, Internet service providers or neighborhood WiFi eavesdroppers can measure Internet traffic rates from smart home devices and infer consumers' private in-home behaviors. Here we propose four strategies that device manufacturers and third parties can take to protect consumers from side-channel traffic rate privacy threats: 1) blocking traffic, 2) concealing DNS, 3) tunneling traffic, and 4) shaping and injecting traffic. We hope that these strategies, and the implementation nuances we discuss, will provide a foundation for the future development of privacy-sensitive smart homes.
A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic
Noah Apthorpe, Dillon Reisman, Nick Feamster
Data and Algorithmic Transparency Workshop (DAT). 2016
The increasing popularity of specialized Internet-connected devices and appliances, dubbed the Internet-of-Things (IoT), promises both new conveniences and new privacy concerns. Unlike traditional web browsers, many IoT devices have always-on sensors that constantly monitor fine-grained details of users’ physical environments and influence the devices’ network communications. Passive network observers, such as Internet service providers, could potentially analyze IoT network traffic to infer sensitive details about users. Here, we examine four IoT smart home devices (a Sense sleep monitor, a Nest Cam Indoor security camera, a WeMo switch, and an Amazon Echo) and find that their network traffic rates can reveal potentially sensitive user interactions even when the traffic is encrypted. These results indicate that a technological solution is needed to protect IoT device owner privacy, and that IoT-specific concerns must be considered in the ongoing policy debate around ISP data collection and usage.

Preprints

Analyzing Privacy Policies Using Contextual Integrity Annotations
Yan Shvartzshnaider, Noah Apthorpe, Nick Feamster, Helen Nissenbaum
arXiv & SSRN Preprint. 2018
In this paper, we demonstrate the effectiveness of using the theory of contextual integrity (CI) to annotate and evaluate privacy policy statements. We perform a case study using CI annotations to compare Facebook's privacy policy before and after the Cambridge Analytica scandal. The updated Facebook privacy policy provides additional details about what information is being transferred, from whom, by whom, to whom, and under what conditions. However, some privacy statements prescribe an incomprehensibly large number of information flows by including many CI parameters in single statements. Other statements result in incomplete information flows due to the use of vague terms or omitting contextual parameters altogether. We then demonstrate that crowdsourcing can effectively produce CI annotations of privacy policies at scale. We test the CI annotation task on 48 excerpts of privacy policies from 17 companies with 141 crowdworkers. The resulting high precision annotations indicate that crowdsourcing could be used to produce a large corpus of annotated privacy policies for future research.
Detecting Compressed Cleartext Traffic from Consumer Internet of Things Devices
Daniel Hahn*, Noah Apthorpe, Nick Feamster
arXiv Preprint. 2018
Data encryption is the primary method of protecting the privacy of consumer device Internet communications from network observers. The ability to automatically detect unencrypted data in network traffic is therefore an essential tool for auditing Internet-connected devices. Existing methods identify network packets containing cleartext but cannot differentiate packets containing encrypted data from packets containing compressed unencrypted data, which can be easily recovered by reversing the compression algorithm. This makes it difficult for consumer protection advocates to identify devices that risk user privacy by sending sensitive data in a compressed unencrypted format. Here, we present the first technique to automatically distinguish encrypted from compressed unencrypted network transmissions on a per-packet basis. We apply three machine learning models and achieve a maximum 66.9% accuracy with a convolutional neural network trained on raw packet data. This result is a baseline for this previously unstudied machine learning problem, which we hope will motivate further attention and accuracy improvements. To facilitate continuing research on this topic, we have made our training and test datasets available to the public.
Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT Traffic
Noah Apthorpe, Dillon Reisman, Srikanth Sundaresan, Arvind Narayanan, Nick Feamster
arXiv Preprint. 2017
The growing market for smart home IoT devices promises new conveniences for consumers while presenting new challenges for preserving privacy within the home. Many smart home devices have always-on sensors that capture users' offline activities in their living spaces and transmit information about these activities on the Internet. In this paper, we demonstrate that an ISP or other network observer can infer privacy sensitive in-home activities by analyzing Internet traffic from smart homes containing commercially-available IoT devices even when the devices use encryption. We evaluate several strategies for mitigating the privacy risks associated with smart home device traffic, including blocking, tunneling, and rate-shaping. Our experiments show that traffic shaping can effectively and practically mitigate many privacy risks associated with smart home IoT devices. We find that 40KB/s extra bandwidth usage is enough to protect user activities from a passive network adversary. This bandwidth cost is well within the Internet speed limits and data caps for many smart homes.

Book Chapters

Using Contextual Integrity as a Gauge for Governing Knowledge Commons.
Yan Shvartzshnaider, Madelyn Sanfilippo, Noah Apthorpe
In B. Frischmann, M. R. Sanfilippo, K. J. Strandburg (eds.)
Governing Privacy in Knowledge Commons. 2021