Aashish Kolluri | Publications

2024

IEEE S&P

Attacking Byzantine Robust Aggregation in High Dimensions

Sarthak Choudhary*, Aashish Kolluri*, and Prateek Saxena

IEEE S&P 2024 "Efficient & Trustworthy Distributed Machine Learning"

Abs PDF Code Website

Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open.
arXiv

Scalable Neural Network Training over Distributed Graphs

Aashish Kolluri*, Sarthak Choudhary*, Bryan Hooi, and Prateek Saxena

Arxiv 2024 "Efficient & Trustworthy Distributed Machine Learning"

Abs PDF Code Website

Graph neural networks offer a promising approach to supervised learning over graph data. Graph data, especially when it is privacy-sensitive or too large to train on centrally, is often stored partitioned across disparate processing units (clients) which want to minimize the communication costs during collaborative training. The fully-distributed setup takes such partitioning to its extreme, wherein features of only a single node and its adjacent edges are kept locally with one client processor. Existing GNNs are not architected for training in such setups and incur prohibitive costs therein. We propose RETEXO, a novel transformation of existing GNNs that improves the communication efficiency during training in the fully-distributed setup. We experimentally confirm that RETEXO offers up to 6 orders of magnitude better communication efficiency even when training shallow GNNs, with a minimal trade-off in accuracy for supervised node classification tasks.

2023

TPDP

Per-User Histograms in the Shuffle Model

Aashish Kolluri, Jacob Imola, and Amrita Roychowdhury

TPDP 2023 "Efficient & Trustworthy Distributed Machine Learning"

Abs PDF Code Website

We study the problem of estimating per-user histograms with differential privacy under the shuffle DP model. In contrast to the widely studied aggregation-based histograms (a single histogram aggregated over all users), we focus on modeling individual user behavior. For instance, Google uses this query to identify popular topics of websites in a user’s browsing history. We investigate the question whether the shuffle DP model result in privacy amplification for the per-user histogram query. We propose a new mechanism, which combines the Laplace mechanism with shuffling of users’ local DP outputs. The key idea is to cluster users based on their outputs and shuffle the outputs of users within the same cluster. Our approach yields promising results in terms of privacy and utility.
OOPSLA

User-customizable Transpilation of Scripting Languages

Bo Wang, Aashish Kolluri, Ivica Nikolic, Teodora Baluta, and Prateek Saxena

OOPSLA 2023 "Program Synthesis, Translation, and Debugging"

Abs PDF Code Website

A transpiler converts code from one programming language to another. Many practical uses of transpilers require the user to be able to guide or customize the program produced from a given input program. This customizability is important for satisfying many application-specific goals for the produced code such as ensuring performance, readability, maintainability, compatibility, and so on. Conventional transpilers are deterministic rule-driven systems often written without offering customizability per user and per program. Recent advances in transpilers based on neural networks offer some customizability to users, e.g. through interactive prompts, but they are still difficult to precisely control the production of a desired output. Both conventional and neural transpilation also suffer from the "last mile" problem: they produce correct code on average, i.e., on most parts of a given program, but not necessarily for all parts of it. We propose a new transpilation approach that offers fine-grained customizability and reusability of transpilation rules created by others, without burdening the user to understand the global semantics of the given source program. Our approach is mostly automatic and incremental, i.e., constructs translation rules needed to transpile the given program as per the user’s guidance piece-by-piece. We implement the transpiler as a tool called DuoGlot, which translates Python to Javascript programs, and evaluate it on the popular GeeksForGeeks benchmarks. DuoGlot achieves 90% translation accuracy and so it outperforms all existing translators, while it produces readable code. We evaluate DuoGlot on two additional benchmarks, containing more challenging and longer programs, and similarly observe improved accuracy.

2022

CCS

LPGNet: Link Private Graph Networks for Node Classification

Aashish Kolluri, Teodora Baluta, Bryan Hooi, and Prateek Saxena

CCS 2022 "Efficient & Trustworthy Distributed Machine Learning"

Abs PDF Code Website

Classification tasks on labeled graph-structured data have many important applications ranging from social recommendation to financial modeling. Deep neural networks are increasingly being used for node classification on graphs, wherein nodes with similar features have to be given the same label. Graph convolutional networks (GCNs) are one such widely studied neural network architecture that perform well on this task. However, powerful link-stealing attacks on GCNs have recently shown that even with black-box access to the trained model, inferring which links (or edges) are present in the training graph is practical. In this paper, we present a new neural network architecture called LPGNet for training on graphs with privacy-sensitive edges. LPGNet provides differential privacy guarantees for edges using a novel design for how graph edge structure is used during training. We empirically show that LPGNet models often lie in the sweet spot between providing privacy and utility: They can offer better utility than "trivially" private architectures which use no edge information (e.g. vanilla MLPs) and better resilience against existing link-stealing attacks than vanilla GCNs which use the full edge structure. LPGNet also offers consistently better privacy-utility tradeoffs than DPGCN, which is the state-of-the-art mechanism for retrofitting differential privacy into conventional GCNs, in most of our evaluated datasets.

2021

CCS

Private Hierarchical Clustering in Federated Networks

Aashish Kolluri, Teodora Baluta, and Prateek Saxena

CCS 2021 "Efficient & Trustworthy Distributed Machine Learning"

Abs PDF Code Website

Analyzing structural properties of social networks, such as identifying their clusters or finding their central nodes, has many applications. However, these applications are not supported by federated social networks that allow users to store their social contacts locally on their end devices. In the federated regime, users want access to personalized services while also keeping their social contacts private. In this paper, we take a step towards enabling analytics on federated networks with differential privacy guarantees about protecting the user’s social contacts. Specifically, we present the first work to compute hierarchical cluster trees using local differential privacy. Our algorithms for computing them are novel and come with theoretical bounds on the quality of the trees learned. Empirically, our differentially private algorithms learn trees that are of comparable quality (with at most about 10% utility loss) to the trees obtained from the non-private algorithms, while having reasonable privacy (0.5 łeq ε łeq 2). Private hierarchical cluster trees enable new application setups where a service provider can query the community structure around a target user without having their social contacts. We show the utility of such queries by redesigning two state-of-the-art social recommendation algorithms for the federated social network setup. Our recommendation algorithms significantly outperform the baselines that do not use social contacts.
FSE

SynGuar: Guaranteeing Generalization in Programming by Example

Bo Wang, Teodora Baluta, Aashish Kolluri, and Prateek Saxena

ESEC/FSE 2021 "Program Synthesis, Translation, and Debugging"

Abs PDF Code Website

Programming by Example (PBE) is a program synthesis paradigm in which the synthesizer creates a program that matches a set of given examples. In many applications of such synthesis (e.g., program repair or reverse engineering), we are to reconstruct a program that is close to a specific target program, not merely to produce some program that satisfies the seen examples. In such settings, we wish that the synthesized program generalizes well, i.e., has as few errors as possible on the unobserved examples capturing the target function behavior. In this paper, we propose the first framework (called SynGuar) for PBE synthesizers that guarantees to achieve low generalization error with high probability. Our main contribution is a procedure to dynamically calculate how many additional examples suffice to theoretically guarantee generalization. We show how our techniques can be used in 2 well-known synthesis approaches: PROSE and STUN (synthesis through unification), for common string-manipulation program benchmarks. We find that often a few hundred examples suffice to provably bound generalization error below 5% with high (≥ 98%) probability on these benchmarks. Further, we confirm this empirically: SynGuar significantly improves the accuracy of existing synthesizers in generating the right target programs. But with fewer examples chosen arbitrarily, the same baseline synthesizers (without SynGuar) overfit and lose accuracy.
AsiaCCS

Localizing Vulnerabilities Statistically From One Exploit

Shiqi Shen, Aashish Kolluri, Zhen Dong, Prateek Saxena, and Abhik Roychoudhury

AsiaCCS 2021 "Program Synthesis, Translation, and Debugging"

Abs PDF Code Website

Automatic vulnerability diagnosis can help security analysts identify and, therefore, quickly patch disclosed vulnerabilities. The vulnerability localization problem is to automatically find a program point at which the "root cause" of the bug can be fixed. This paper employs a statistical localization approach to analyze a given exploit. Our main technical contribution is a novel procedure to systematically construct a test-suite which enables high-fidelity localization. We build our techniques in a tool called VulnLoc which automatically pinpoints vulnerability locations, given just one exploit, with high accuracy. VulnLoc does not make any assumptions about the availability of source code, test suites, or specialized knowledge of the type of vulnerability. It identifies actionable locations in its Top-5 outputs, where a correct patch can be applied, for about 88% of 43 CVEs arising in large real-world applications we study. These include 6 different classes of security flaws. Our results highlight the under-explored power of statistical analyses, when combined with suitable test-generation techniques.

2019

ISSTA

Exploiting the Laws of Order in Smart Contracts

Aashish Kolluri, Ivica Nikolic, Ilya Sergey, Aquinas Hobor, and Prateek Saxena

ISSTA 2019 "Security of Decentralized Applications"

Abs PDF Code Website

We investigate a family of bugs in blockchain-based smart contracts, which we dub event-ordering (or EO) bugs. These bugs are intimately related to the dynamic ordering of contract events, i.e. calls of its functions, and enable potential exploits of millions of USD worth of crypto-coins. Previous techniques to detect EO bugs have been restricted to those bugs that involve just one or two event orderings. Our work provides a new formulation of the general class of EO bugs arising in long permutations of such events by using techniques from concurrent program analysis. The technical challenge in detecting EO bugs in blockchain contracts is the inherent combinatorial blowup in path and state space analysis, even for simple contracts. We propose the first use of partial-order reduction techniques, using automatically extracted happens-before relations along with several dynamic symbolic execution optimizations. We build EthRacer, an automatic analysis tool that runs directly on Ethereum bytecode and requires no hints from users. It flags 8% of over 10, 000 contracts analyzed, providing compact event traces (witnesses) that human analysts can examine in only a few minutes per contract. More than half of the flagged contracts are likely to have unintended behaviour.

2018

ACSAC

Finding The Greedy, Prodigal, and Suicidal Contracts at Scale

Ivica Nikolić, Aashish Kolluri, Ilya Sergey, Prateek Saxena, and Aquinas Hobor

ACSAC 2018 "Security of Decentralized Applications"

Abs PDF Code Website

Smart contracts—stateful executable objects hosted on blockchains like Ethereum—carry billions of dollars worth of coins and cannot be updated once deployed. We present a new systematic characterization of a class of trace vulnerabilities, which result from analyzing multiple invocations of a contract over its lifetime. We focus attention on three example properties of such trace vulnerabilities: finding contracts that either lock funds indefinitely, leak them carelessly to arbitrary users, or can be killed by anyone. We implemented Maian, the first tool for specifying and reasoning about trace properties, which employs interprocedural symbolic analysis and concrete validator for exhibiting real exploits. Our analysis of nearly one million contracts flags 34, 200 (2, 365 distinct) contracts vulnerable, in 10 seconds per contract. On a subset of 3, 759 contracts which we sampled for concrete validation and manual analysis, we reproduce real exploits at a true positive rate of 89%, yielding exploits for 3, 686 contracts. Our tool finds exploits for the infamous Parity bug that indirectly locked $200 million US worth in Ether, which previous analyses failed to capture.
ICISS

(Invited Paper) on the Security of Blockchain Consensus Protocols

Sourav Das, Aashish Kolluri, Prateek Saxena, and Haifeng Yu

ICISS 2018 "Security of Decentralized Applications"

Abs Website

In the last decade, several permissionless proof-of-work blockchain protocols have focused on scalability. Since these protocols are very difficult to change once deployed, their robustness and security are of paramount importance. This paper summarizes the desired end properties of blockchain consensus protocols and sheds light on the critical role of theoretical analyses of their design. We summarize the major paradigms in prior constructions and discuss open issues in this space.