The naming of files

When dealing with a large amount of information it is important to be able to find the specific data you want quickly and easily. Although the search facilities on many computer platforms are now efficient and effective, knowing where to look in the first place will always save time.

One of the best ways to ensure you can do this is to devise a consistent file and folder naming system and stick to it. This will help you avoid confusion in future, allow you to expand the data set in a predictable manner and most importantly will ensure that you can quickly find what you are looking for.

It is perfectly acceptable to split data up between a group of related folders which are then stored within a parent folder (which could itself be stored within a grandparent folder).

The folders should be named using a consistant, human comprehensible schema. Where possible they should be named so that there is some consistency when sorted alphabetically (or alphanumerically). For example, subfolders for a project could be named to group the data by year (e.g. 2007, 2008, 2009 etc) each of which could then contain subfolders to further group the data by month (e.g. 01, 02, 03 etc). This means that the path to the file both contains useful at a glance information about the data therein and is to some extent guessable.

Creating a data structure such as this from the start will ensures that you don’t need to move data later on, which can be time consuming and/or risky if involving large complex files.

Sorting

The above example illustrates how useful it is to name files and/or folders in such a way that they are sorted in a meaningful manner. If you do wish to use date to sort them then it is best to include the date in the file or folder name as follows:

YYYY-MM-DD or YYYYMMDD

where YYYY is the full year, MM is the month (from 01 for January to 12 for December) and DD is the date (from 01 to 31).  Elements of the date can be omitted if folder structure indicates it – for example a file stored in a folder named for the year it refers to need only be named for month and day (e.g. research-data/2008/0323.txt)

This illustrates another important consideration – avoid repeating information in file and folder names to simplify and/or shorten paths as much as possible to make them easy to print or send in an email.

Version numbers can also be used. When starting a version schema make sure you give yourself enough room for future expansion, for example naming the first iteration v001 rather than v1 giving yourself 998 possible future versions rather than just 8.

What not to use

It is best to restrict filenames to alphanumeric characters only as other characters may not be supported on all platforms or may be reserved for use by the operating system.

Never use spaces, underscores or periods (aside from the period used to separate the filename from the file extension). If you do need to separate words, use a dash character  -. This improves human readability and can have additional benefits for documents placed on the web – many search engines will treat words separated by dashes as discrete keywords.

Case sensitivity

File names on some platforms are case sensitive, whereas on others they are not. In order to avoid any potential problems this may cause it is best to use lowercase only when naming files and folders.

Posted in Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

*