Fiscal Data from Audit Reports with NLP and ML
Updated: Oct 26, 2020
This project uses a collection of audit reports scraped from the Auditoria Superior del Estado de Sinaloa's (State Audit Office) website that detail spending of each municipality from 2008 through 2016. The objective of this project is to identify in what municipalities the auditor reported major observations and what was the monetary amount of these discrepancies. The project is broken into 6 sections: 1. Preprocessing documents 2. Implementing an LDA topic modeller 3. Implementing a Text Classification Model 4. Extracting information using NER 5. Results
Please see the code and presentation at this interactive Google Colab notebook.
Download PDF • 2.22MB