A Smarter Way to Analyze Trade
TradeSleuth is a system designed to detect records of interest for further scrutiny, including potential instances of origin misdeclaration, fraud, and illegality with regards to traded commodities.
TradeSleuth has been developed in several phases and draws on established qualitative and quantitative trade analysis methods that span multiple areas of focus in trade economics, supply chain logistics, business relations, machine learning, and political science.
Initial research in these fields was based on advances in the theory and political science of international trade, policy, and incentives corporate actors faced to seek to skirt regulatory and policy measures such as the evasion of tariffs or sanctions (for example, Bhagwati, 1964). Near global adoption of the World Customs Organization HS codes in 1988 and public access to both HS and BoL international trade data enabled the growth of data visualization techniques (e.g., ResourceTrade.Earth, 2022), and quantitative methods of analysis to study and test many of the early theoretical underpinnings of international trade theory and political science, as well as methods to analyze current emerging societal issues such as the supply chain challenges during the COVID-19 pandemic (Flaaen, et al., 2021) and detection of illicit dual-use commodity trade to identify patterns in trade used to build weapons of mass destruction (Nelson, 2023).
An expanding area of research in the last decade has involved studying corporate supply chains and trade (MacCarthy, et al., 2022; Goldstein and Newell, 2020; Goldstein and Newell, 2019). A subfield emerged with research focused on determining the illegality of shipments/transactions among the millions of goods traded legally globally and the often undetected, out-of-sight, environmental and humanitarian impacts of global supply chains (Chamanara et al., 2021; Cho, et al. 2022; Cho, et al. 2021; Damm et al., 2022; Deconinck and Toyama, 2022; Hedlund et al., 2022; Leijten, et al. 2022; zu Ermgassen, et al., 2021).
The ability to identify specific shipments that may be in circumvention of economic sanctions, trade restrictions, high tariff rates, or be consistent with suspicious or illegal activity depends largely on how well large volumes of trade data are cleaned, preprocessed, and analyzed.
To effectively process and clean trade data, we developed TradeSweep, a Large Language Model (LLM)-based tool as a key first step in the processing pipeline that leverages the code-writing abilities of LLMs to carry out preprocessing tasks.
TradeSweep stands out for its ability to translate plain English instructions into executable data-cleaning steps, removing the need for technical coding expertise. It identifies and fixes common issues in trade datasets—such as missing values, inconsistent formats, and typos—and shows users a visual preview of the cleaned data before applying changes. It also learns over time, storing successful cleaning routines in a growing code library that makes future tasks faster and more accurate. By combining automation with user feedback, TradeSweep ensures that preprocessing is both efficient and tailored to the specific needs of trade data analysis.
TradeSleuth has been developed to analyze the underlying Bill of Lading (BoL) data at two levels of granularity.
First, distributional analysis to identify significant differences in export BoL datasets in terms of countries, ports of lading, and trading volumes before and after the onset of key geopolitical events. This approach examines trade data in aggregate, focusing on changes over time in key variables such as countries of origin and destination, ports of loading, and trading volumes. By comparing these distributions before and after major events—such as the onset of a conflict—it becomes possible to detect significant shifts that may signal broader systemic changes or unusual trade behavior.
Second, anomaly detection at the individual trade record level is conducted using deep learning methods. Much as how modern search engines or AI chatbots make sense of language by learning the company a word keeps, TradeSleuth uses distributional representations to understand relationships between different parts of a record. These methods learn what "typical" shipments look like and flag records that deviate from those patterns.
The TradeSleuth team has received funding support and research partnerships from WWF US and TRAFFIC (2018–2021), World Forest ID (2022–2024), and the Natural Resources Defense Council (2025–present). In addition to the generous support and collaboration from these organizations, we also wish to acknowledge the invaluable contributions of students who have contributed to this vision along the way, including Debanjan Datta, Amanda Lee, Jay Katyan, Andrew Neeser, and numerous others whose efforts have shaped the direction and impact of our work.
We remain actively interested in identifying new project partners and funders who share our commitment to advancing responsible and data-driven trade enforcement. If you are interested in collaborating, please don’t hesitate to contact us.
C.T. Lee, A. Neeser, S. Xu, J. Katyan, P. Cross, S. Pathakota, M. Norman, J.C. Simeone, and N. Ramakrishnan, Can an LLM find its way around a Spreadsheet?, in Proceedings of the47th IEEE/ACM International Conference on Software Engineering (ICSE 2025), Ottawa, Canada, Apr-May 2025. [Paper]
D. Datta, J.C. Simeone, A. Meadows, W. Outhwaite, C.H. Keong, N. Self, L. Walker, and N. Ramakrishnan, Combating Trade in Illegal Wood and Forest Products with Machine Learning, PLoS ONE, Vol. 20, No. 1, 24 pages, Jan 2025. [Paper]
M. Norman (Walkins), Tracking Russian Birch. World Forest ID Insight Briefing, Sep 2024. [Link]
D. Datta, N.W. Self, J. Simeone, A. Meadows, W. Outhwaite, N. Elmqvist, and N. Ramakrishnan, TimberSleuth: Visual Anomaly Detection with Human Feedback for Mitigating the Illegal Timber Trade, Information Visualization, Vol. 22, No. 3, pages 223-245, July 2023. [Paper]
D. Datta, F. Chen, and N. Ramakrishnan, Framing Algorithmic Recourse for Anomaly Detection, in Proceedings of the ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD’22), pages 282-293, Aug 2022. [Paper]
D. Datta, S. Muthiah, J. Simeone, A. Meadows, and N. Ramakrishnan, Scrutinizing Shipment Records To thwart Illegal Timber Trade, in Proceedings of the Outlier Detection and Description Workshop, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’21), Aug 2021. [Paper]
D. Datta, M.R. Islam, N. Self, A. Meadows, J. Simeone, W. Ouhwaite, C.H. Keong, A. Smith, L. Walker, and N. Ramakrishnan, Detecting Suspicious Timber Trades, in Proceedings of the Thirty-Second Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-20), pages 13248-13254, New York, NY. [Paper]