Semaphores are used to protect critical regions of code or data structures. Remember that each access of a critical piece of data such as a VFS inode describing a directory is made by code running on behalf of a process. It would be very dangerous to allow one process to alter a critical data structure that is being used by another process. One way to achieve this would be not to call the scheduler whilst the critical piece of data is being accessed but this is a simplistic approach that would not give very good system performance. After all, there may only be one process in the system that is actually interested in the critical piece of data, why penalize other processes? Instead Linux uses semaphores to allow just one process at a time to access critical regions of code and data; all other processes wishing to access this resource will be made to wait until it becomes free. The waiting processes are suspended, other processes in the system can continue to run as normal.
A Linux semaphore data structure contains the following information:
Suppose the initial count for a semaphore is 1, the first process to come along will see that the count is positive and decrement it by 1, making it 0. It now owns the resource and can use it. If another process wishes to access this resource it sees that the count is 0, meaning that it is in use. It decrements the count, increments the waiting count and puts itself onto the wait queue for the semaphore. It then calls the scheduler to schedule another process in the system to run as it cannot run until it gains control of this resource. Eventually the first process has finished with the resource and it increments the count. If the count is less than or equal to zero it means that one or more other processes (waiting processes, actually) waiting for the semaphore. In this case it normalizes the waiting and count fields by adding the number waiting to the count and setting the waiting count to zero. It then wakes up the first process on the semaphore's wait queue. The count is now positive and so the first process that runs can grab the resource. All of the other waiting processes will fail to allocate the resource and will wait again.