Trip to the Land of the Automatic Extraction of Data from Documents
We finally did it! About 40 people (including journalists, software programmers, and militants from Argentine human rights organizations) disembarked last August 13th at the Hacks/Hackers Buenos Aires hackathon, which took place within the mega-exhibition Tecnópolis. The idea was to work all day on Mapa76.info, a software project for the automatic extraction and visualization of data from text documents. The software is focused on analyzing the trials of Argentina´s last military dictatorship, which ruled between 1976 and 1983. Journalists and programmers came not only from Buenos Aires but from Rosario and Córdoba, as well, and were helped by the creators of Junar.com, a data streaming api for dashboards, who came all the way from Chile to participate in the event and demonstrate their technology.
The problem to solve: At this moment in Argentina there is a great quantity of judicial cases linked to the repressions of the last military dictatorship: more than 200 convicted persons, dozens of trials in process, hundreds of witnesses testifying every day, and possibly more than a thousand accused persons implicated in acts of State terrorism between 1976 and 1983. The question is: Can we develop a software that finds relationships that people cannot see? It is necessary for journalists, for the courts, and for investigators to define such relationships between persons, organizations, and places, and to visualize these relationships in timelines and maps.
**What the journalists worked on: ** The engine of Mapa76.info – still in its alpha stage – extracts names, places, and dates. At first, the journalists “combed” documents for sentences and allegations establishing relationships between dates and special events such as sequestrations, tortures, transfers, etc., in order to visualize those events in a timeline. Later they thought up possible use cases:
- Who was with whom in a clandestine location?
- Following one person´s story: What happened to him or her?
- In order to write an article about a person, one can proceed by “combing” for them in all of the documents where they are mentioned;
- Compare two life stories;
- Compare versions of a story;
- Comb documents in an effort to tell a document-based story;
- Incorporate other sources, such as foreign newspapers;
- Compare two testimonies given by the same person at different times.
Led by Martín Sarsale, the programmers worked on improving the interface for loading documents and extracting data, as well the interface for data display (timelines, maps, visualization of documents) (Ruby / jQuery). They also worked on improving data “loading” and on converting PDFs into easy-to-use text documents.
The hackathon relied on support from the National Attorney General´s Unit for the Coordination and Tracking of Human Rights Cases (Unidad de Coordinación y Seguimiento de Causas de Derechos Humanos de la Procuración General de la Nación). Later the hackathon got in touch with the coordinating team of the Federal Network of Memorial Sites (Red Federal de Sitios de Memoria) and members of the Argentine Ministry of Education who were interested in the project. Media coverage of the hackathon can be seen at Página/12, the Tecnopolis website and YouTube.
Among other participants were Joel Matías Silva, Damian Silvani, Lucas Tolchinsky, Nahuel Baglieto, Sergio Sorin, Tania Wassaf, Manuel Milla, Ezequiel Clerici, Guillermo González, Mariano Mancuso, Mariano Zapatero, Luis Guardiola, Matias Iturburu, Javier Ciancio, Gisela Cardozo, Gabriel, Javier Pájaro, Joaquín Nuñez, Rodrigo Aza, Marcos Vanetta, Felipe Lerena, Filippo Fiorini and the organizer crew of Hacks/Hackers Buenos Aires, conformed by Mariano Blejman (Página/12), Martín Sarsale (Sumavisos), Guillermo Movia (Mozilla Argentina), César Miquel (Easytech) y Mariana Berruezo. Diego Accorinti made the grafic design of Mapa76.info. Post translated to english by Michael Romano.
Web http://meetupba.hackshackers.com
blog http://www.hackshackers.com
mail ba (at) hackshackers (dot) com
twitter @HacksHackersBA