Skip to content

Latest commit

 

History

History

1_getting_and_keeping_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Getting and Keeping Data

Data comes in many forms, from many sources - you may get a database dump directly from a project partner, or you may need to scrape data from the web (see Basic Web Scraping). Either way, once you've got your hands on some data, you'll need to bring it into a database, and start formatting it in such a way that you can use it for analysis. Command Line Tools will start to come in handy here. If your data is in a format that resembles CSV this instructions will be helpful. You'll definitely want to keep track of the steps you took to go from raw data to model-ready data (Reproducible ETL).

Often data science for social good projects will involve sensitive data, so it's important to be aware of some basic principles of data security: Data Security Primer.