Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder the Exporting Data/Project chapter #169

Merged
merged 2 commits into from
Jul 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 84 additions & 49 deletions episodes/06-saving.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,54 +6,104 @@ exercises: 5

::::::::::::::::::::::::::::::::::::::: objectives

- Save an OpenRefine project.
- Export cleaned data from an OpenRefine project.
- Save an OpenRefine project as a shareable file.

::::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::: questions

- How can we save and export our cleaned data from OpenRefine?
- How can we get our cleaned data out of OpenRefine?
- How can we save the whole project with all history as a file?

::::::::::::::::::::::::::::::::::::::::::::::::::

## Saving and Exporting a Project
## Exporting Cleaned Data

When you completed the cleaning steps, you probably want to save the cleaned
dataset as a new file, so that you can further analyse the data using other
applications.
OpenRefine allows you to do so by *exporting* the data in various file formats.

1. Click `Export` in the top right and select the file type you want to export
the data in. `Tab-separated values` (`tsv`) or `Comma-separated values`
(`csv`) would be good choices.
2. OpenRefine creates a file whose name is based on the project name and asks
the browser to download it.
Depending on your browser settings, this file is automatically saved in the
default location for downloaded files, or you see a dialog window to choose
where you want to save the file.

The downloaded file can then be opened in a spreadsheet program or imported into
programs written in R or Python, for example.

Remember from our lesson on Spreadsheets that using widely-supported,
non-proprietary file formats like `tsv` or `csv` improves the ability of
yourself and others to use your data.

In OpenRefine you can save or export the project. This means you're saving the
::::::::::::::::::::::::::: callout

### Only matching rows are exported

OpenRefine only operates on rows that match all enabled filters.
This is also true for exporting data.
So if you want to export a selection from a larger dataset, you can use filters
and facets to select what data you want to export.

However, if you wanted to export all data and forget to reset all facets and filters,
the exported dataset may appear to be incomplete.
OpenRefine does not provide a warning about enabled filters when you export data.

:::::::::::::::::::::::::::::::::::


## Saving a Project as a File

Next to exporting the data, you can export the project as well.
When you export the project, OpenRefine creates a single file that includes the
data and all the information about the cleaning and data transformation steps
you've done. Once you've saved a project, you can open it up again and be just
where you stopped before.
that you have taken.

You can use this file as a project backup, transfer it to another computer to
continue working on the data or share it with a collaborator who can open it
to see what you did and continue the work.

### Saving
::::::::::::::::::::::::::: callout

### Saving happens automatically

By default OpenRefine is saving your project continuously while you work on it.
If you close OpenRefine and open it up again, you can see a list of your
projects when you select "Open Project" on the start screen.
You can open an existing project by clicking on its title.

:::::::::::::::::::::::::::::::::::

By default OpenRefine is saving your project continuously. If you close
OpenRefine and open it up again, you'll see a list of your projects. You can
click on any one of them to open it up again.

::::::::::::::::::::::::: challenge

### Exporting the project
### Exporting and examining the project

You can also export a project. This is helpful, for instance, if you wanted to
send your raw data and cleaning steps to a collaborator, or share this
information as a supplement to a publication.
In this exercise, we will export the project and examine the contents of the
exported file.

1. Click the `Export` button in the top right and select `OpenRefine project archive to file`.
2. A `tar.gz` file will download to your default `Download` directory.
Depending on your browser you may have to confirm that you want to save the
file. The `tar.gz` extension tells you that this is a compressed file. The
downloaded `tar.gz` file is actually a folder of files which have been
compressed. Linux and Mac machines will have software installed to
automatically expand this type of file when you double-click on it. For
Windows based machines you may have to install a utility like '7-zip' in
order to expand the file and see the files in the folder.
3. After you have expanded the file look at the files that appear in this
folder. What files are here? What information do you think these files
contain?
2. OpenRefine then presents a `tar.gz` file for download.
Depending on your browser you may have to specify where you want to save the
file, or it may be downloaded to your default directory for downloaded files.
The `tar.gz` extension tells you that this is a compressed file. The
downloaded `tar.gz` file is actually a folder of files which have been
compressed. Linux and Mac machines will have software installed to
automatically expand this type of file when you double-click on it. For
Windows based machines you may have to install a utility like '7-zip' in
order to expand the file and see the files in the folder.
3. After you have expanded the file, look at the files that appear in this
folder. What files are here? What information do you think these files
contain?

::::::::::::::: solution

## Solution
### Solution

You should see:

Expand All @@ -69,33 +119,18 @@ You should see:

:::::::::::::::::::::::::::::::::::

You can import an existing project into OpenRefine by clicking `Open...` in the
upper right > `Import Project` and selecting the `tar.gz` project file. This
project will include all of the raw data and cleaning steps that were part of
the original project.

## Exporting Cleaned Data

You can also export just your cleaned data, rather than the entire project.

1. Click `Export` in the top right and select the file type you want to export
the data in. `Tab-separated values` (`tsv`) or `Comma-separated values`
(`csv`) would be good choices.
2. That file will be exported to your default `Download` directory. That file
can then be opened in a spreadsheet program or imported into programs like R
or Python, which we'll be discussing later in our workshop.

Remember from our lesson on Spreadsheets that using widely-supported,
non-proprietary file formats like `tsv` or `csv` improves the ability of
yourself and others to use your data.
### Importing a Project

You can import an existing project into OpenRefine by clicking `Open...` in the
upper right, then opening the `Import Project` tab and selecting the `tar.gz`
project file.


:::::::::::::::::::::::::::::::::::::::: keypoints

- Cleaned data or entire projects can be exported from OpenRefine.
- Projects can be shared with collaborators, enabling them to see, reproduce and check all data cleaning steps you performed.
- Cleaned data, or selected data, can be exported from OpenRefine
for use in other applications.
- Projects can be exported to files that contain the original data
and all data cleaning steps you performed.

::::::::::::::::::::::::::::::::::::::::::::::::::