Abstract
The Digital Platform of the Peruvian State is mainly composed of seven websites. To obtain complete information about a citizen, information must be extracted from each website and integrated manually, which can take more than 3 min. The objective is to centralize the public information coming from the seven websites through a single web platform by applying web scraping. The methodology to implement the web scraping technique, the Selenium tool was used to simulate the information query process by a user entering an ID number, and the web platform was developed based on the Scrum methodology divided into three Sprints. As a result, users can visualize with a simple query the public information of a citizen stored and available on different websites, and the average time of information search of the citizen was reduced from 136 to 24 s. In conclusion, it can be affirmed that the use of web scraping can extract from different governmental websites the information of a citizen with a simple query in a fast and complete way.
Original language | English |
---|---|
Title of host publication | Proceedings of the 9th Brazilian Technology Symposium (BTSym’23) - Emerging Trends and Challenges in Technology |
Editors | Yuzo Iano, Rangel Arthur, Osamu Saotome, Guillermo Leopoldo Kemper Vásquez, Maria Thereza de Moraes Gomes Rosa, Gabriel Gomes de Oliveira |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 558-567 |
Number of pages | 10 |
ISBN (Print) | 9783031669606 |
DOIs | |
State | Published - 2024 |
Event | 9th Brazilian Technology Symposium on Emerging Trends and Challenges in Technology, BTSym 2023 - Campinas, Brazil Duration: 24 Oct 2023 → 26 Oct 2023 |
Publication series
Name | Smart Innovation, Systems and Technologies |
---|---|
Volume | 402 SIST |
ISSN (Print) | 2190-3018 |
ISSN (Electronic) | 2190-3026 |
Conference
Conference | 9th Brazilian Technology Symposium on Emerging Trends and Challenges in Technology, BTSym 2023 |
---|---|
Country/Territory | Brazil |
City | Campinas |
Period | 24/10/23 → 26/10/23 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Keywords
- Government websites
- Information integration
- Selenium
- Tesseract
- Web scraping