A "hash" is a generic term for a bit of software that - given the same input - will produce the same output, and where the input cannot be reconstructed by looking at the output. For example, given the input of "23" a very simple hash would be "5" (adding them together). In this case you cannot reconstruct "23" by analysing "5" because other inputs will also equal "5" ("41", for example). Although you cannot reconstruct the input, you can verify that the input matches the hash.
The use of hashes in applications is typically used to store passwords because if the password database is stolen and your inputs (passwords) are hashed then the thief can’t feasibly reconstruct the input (the original passwords) and use them in the future. So as well as the thief not having anything useful, it means you don’t need to tell all the customers to change their passwords. Because you're only storing the hash of a password this does mean you can’t tell customers what their password is – you can only change it.
When two inputs make the same output in a hash this is called a collision. Just before "23" was hashed as "5" but "41" would also be "5". Collisions in hashes are to be expected; a mere collision is no big deal. However the output of a good hash is also supposed have some other properties – if a good hash produces a value between 0000 and 9999 it shouldn’t produce twice as many outputs between 8000-9999 – the output should be equally distributed across the output space. Also, a good hash given the input "password" and "pbssword" (which vary by only a single letter) should produce very different outputs. This should apply to all inputs, not just typical inputs.
For example, "password" in the MD5 hash is 5F4DCC3B5AA765D61D8327DEB882CF99. Whereas "pbssword" is 65ADD8ADCD26EA1AF12B05F67FD50B97.
This change in output is important because if I already know 5F4DCC(...) as being the hash of "password" this shouldn't help me work out that 5F4DCB(...) came from an input near the word "password".
Finally, a good hash should make it computationally infesible to construct a possible input. If a hash's output varies only between 1 and 9 then generating a possible input is easy, so this is not a good hash. Good hashes like SHA-1 would take years of computer time to construct a possible input (as currently known). Unless you're a crypto scientist use an established and researched hash.
When using hashes in your applications it's important to remember that hashes are repeatable (given the same input you'll get the same output) and that there are precalculated databases of hashes and popular inputs (popular passwords, english words, etc - and their resulting hashes). These databases could be used for good or bad, but as you're hashing your passwords in the first place you'll be wanting to defeat these databases.
One method of defeating such databases is by salt
ing your input. For example, before hashing the word "password" you might combine "salted" so as to get "saltedpassword" -- producing a hash that the precalculated database is less likely to have (it'll have the hash for "cat" but it's less likely to have "saltedcat"). You can either add your salt at the beginning, or intermingle your letters. You may even decide to add specific bytes onto your input so your salt isn't English. Salting could be done application-wide, or per-user (have a salt column in your user table).
Popular hashes are MD5, SHA-0, and SHA-1. Recently MD5 and SHA-0 were shown to be significantly easier to computationally
So use SHA1 (or greater) as your hash and keep an eye out for recently discovered flaws in hashing algorithms. And as hashes vary in length be sure that your database doesn't truncate your good work in protecting passwords.