The “FL_ORGANIZATION_FILE.csv” dataset file is a large 37MB CSV
formatted file from an Apple O/S with 257,567 rows of data plus one header row
and 24 columns. Microsoft’s Excel 2013 can load it despite this large size. Per
Microsoft’s specifications Excel 2016 and Excel 2013 can handle worksheets up
to 1,048,576 rows by 16,384 columns ( source ).
Google Sheets was unable to import the file as a CSV into a
blank worksheet. Sheets can handle up to 2 million cells for spreadsheets and
this file contains more than 6 million cells (257,568 x 24 = 6,181,632) so it
is beyond what Google Sheets can handle. If we wanted to use Google Sheets we
could need to divide up the data to fit below the 2 million cells per file or
identify columns we could exclude from the work. If we could limit ourselves to
7 columns we could fit within the 2 million cell limit in one file, for
example.
Finding ways to visualize this data from a spreadsheet or
CSV form can be a challenge. Trying to figure out what the start with; what is
it we want to find out about this data? Geographic density based on the city,
state, or zipcode? Time-based statistics on start or end dates for
organizations? Interest types? Identifying multiple registrations based on
mailing address, phone number, email, etc?
For one exploration I plotted the distribution of zipcodes
present in the data in R:
No comments:
Post a Comment