Integration of the Peruvian Citizen’s Public Information by Applying Web Scraping Under SCRUM Methodology

Hugo Vega-Huerta, Ronald Cardeña-Ccahuata, Percy De La Cruz Velez de Villa, Ernesto Cancho-Rodriguez, Gisella Luisa Elena Maquen-Niño

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Digital Platform of the Peruvian State is mainly composed of seven websites. To obtain complete information about a citizen, information must be extracted from each website and integrated manually, which can take more than 3 min. The objective is to centralize the public information coming from the seven websites through a single web platform by applying web scraping. The methodology to implement the web scraping technique, the Selenium tool was used to simulate the information query process by a user entering an ID number, and the web platform was developed based on the Scrum methodology divided into three Sprints. As a result, users can visualize with a simple query the public information of a citizen stored and available on different websites, and the average time of information search of the citizen was reduced from 136 to 24 s. In conclusion, it can be affirmed that the use of web scraping can extract from different governmental websites the information of a citizen with a simple query in a fast and complete way.

Original languageEnglish
Title of host publicationProceedings of the 9th Brazilian Technology Symposium (BTSym’23) - Emerging Trends and Challenges in Technology
EditorsYuzo Iano, Rangel Arthur, Osamu Saotome, Guillermo Leopoldo Kemper Vásquez, Maria Thereza de Moraes Gomes Rosa, Gabriel Gomes de Oliveira
PublisherSpringer Science and Business Media Deutschland GmbH
Pages558-567
Number of pages10
ISBN (Print)9783031669606
DOIs
StatePublished - 2024
Event9th Brazilian Technology Symposium on Emerging Trends and Challenges in Technology, BTSym 2023 - Campinas, Brazil
Duration: 24 Oct 202326 Oct 2023

Publication series

NameSmart Innovation, Systems and Technologies
Volume402 SIST
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Conference

Conference9th Brazilian Technology Symposium on Emerging Trends and Challenges in Technology, BTSym 2023
Country/TerritoryBrazil
CityCampinas
Period24/10/2326/10/23

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Keywords

  • Government websites
  • Information integration
  • Selenium
  • Tesseract
  • Web scraping

Fingerprint

Dive into the research topics of 'Integration of the Peruvian Citizen’s Public Information by Applying Web Scraping Under SCRUM Methodology'. Together they form a unique fingerprint.

Cite this