Stats, Volume 3, Issue 4 (December 2020) – 7 articles
Cover Story (view full-size image):
The analysis of massive databases is a key issue for most applications today, and the use of parallel computing techniques is one of the suitable approaches for that. One way to perform statistical analyses over massive databases is combining some tools via the sparklyr package, which allows for an R application to use Apache Spark as a framework. This paper presents an analysis of Brazilian public data from the Bolsa Família Programme (BFP—conditional cash transfer), comprising a local processing of a large data set with 1.26 billion observations which total more than 100 GB. Our goal was to understand how this social program acts in different cities, as well as to identify potentially important variables to BFP utilization rate. The analysis was performed with RF and indicated the high importance of some variables such as family income, education, occupation, and density of people in
[...] Read more.
- Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
- You may sign up for e-mail alerts to receive table of contents of newly released issues.
- PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader
to open them.
Previous Issue
Next Issue