HOWTO

device drivers:


the major and minor numbers usually identifies the driver associated with a specific device and the minor, the number of the device into that driver. Nowadays different drivers can shar the same major number although.
device numbers are represented by the dev_t type, which holds both, the major and minor numbers of a device. it's a 32-bit which uses 12 bits to store the major number and 20 for the minor. The code should never assumption about the internal organization of the numbers, but, use the following macros:
MAJOR(dev_t dev); retrieve the major number
MINOR(dev_t dev); retrieve the minor number
or
MKDEV(int major, int minor); to retrieve the device itself that belongs to the major and minor here

to allocate and free device numbers:

int register_chrdev_region(dev_t first, unsigned int count, char *name); /* statically allocates a device number. first is the beginning device number of the range, usually 0 */
int alloc_chrdev_region(dev_t *dev, unsigned int firstminor, unsignet int count, char *name); /* Dynamically allocates the major number. *dev will hold the first number of the allocated range, and firstminor should be the requested first minor number to use */
void unregister_chrdev_region(dev_t first, unsigned int count); /* free device numbers, usually called from the module's cleanup function*/

These functions allocate device numbers for the driver's use, but they do not tell the kernel anything about what you will actually do with those numbers. Before allow user-space programs to access one of these device numbers, the driver needs to connect them to its internal functions that implement the device's operation.

Most of the fundamental driver operations involve three important kernel data structures: file_operations, file and inode

Kernel uses a struct of type struct cdev to represent a char device internally, and before the kernel invoke any device operation, one or mor cdev structures should be allocated and initialized.
<linux/cdev.h> contains all necessary structures and functios to lead with cdev.
There are two ways of allocating and initializind these structures:
- Stand alloc at runtime, like:
	struct cdev *my_cdev = cdev_alloc;
	my_cdev->ops = &my_fops;
- Embed the cdev structure within a device specific structure, which, in that case, the already allocated cdev structure should be initialized with:
	void cdev_init(struct cdev *cdev, struct file_operations *fops);
The cdev structure also contains a .owner field that should be set to THIS_MODULE.

Once the cdev structure is set up, the final step is to tell the kernel about it, with a call to:
	int cdev_add(struct cdev *dev, dev_t num, unsigned int count);
where:  dev is the cdev structure,
	num, is the first device number to which this device responds, and
	count is the number of device numbers that should be associated with the device.

Important things to keep in mind about using cdev_add:
	-This call can fail. If it returns a negative error code, the device has not been added to the system
	-As soon as cdev_add returns (with success), the device is "live", and its operations can be called by the kernel. So, make sure to not call cdev_add until the driver is completely ready to handle operations on the device.

To remove a char device from the system:
	void cdev_del(struct cdev *dev);

NEVER access the cdev structure after passes it to cdev_del :)

scull represents each device with a structure of type struct scull_dev, defined as:

	struct scull_dev {
		struct scull_qset *data; /* pointer to first quantum set */
		int quantum;		 /* the current quantum size */
		int qset;		 /* the current array size */
		unsigned long size;	 /* amount of data stored here */
		unsigned int access_key; /* used by sculluid and scullpriv */
		struct semaphore sem;	 /* mutual exclusion semaphore */
		struct cdev cdev;	 /* Char device structure */
	}

The open method 

is provided for the driver to init and prepare itself for later operations, where, in most cases the open method should perform:
	- Check for device-specific errors (such as device-not-ready or similar hardware problems)
	- Initialize the device if it is being opened for the first time
	- Update the f_op pointer, if necessary
	- Allocate and fill any data structure to be put in filp->private_data

First thing is to identify which device is being opened.  /* int (*open)(struct inode *inode, struct file *filp); */
The inode argument of the open method contains this information into the inode->i_cdev, but the driver works with scull_dev structures itself.
so, it uses the container_of() macro to search for the scull_dev structure that contains the specific cdev pointer:

container_of(ptr, type, member)

this macro will return the pointer to the structure of struct 'type', which contains into the 'member' field, a pointer to ptr, for example:

struct foo bar = { a = 5};

int *b = &a;

int c = container_of(b, struct foo, a);

/* int c will contain a pointer to a */

Another way to identify the device being opened, is looking at the minor number stored in the inode structure, that must be used if the device was registered with register_chrdev.
inode->iminor number is used to obtain the minor number, but, be sure that it corresponds to the device the driver is prepared to handle

The release method

The release method act as the reverse of open method. Is possible to find method implementation called device_close instead of device_release, and the method should perform the following tasks:
	- Deallocate anything that open allocated in filp->private_data
	- Shutdown the device on last close

What happens when a device file is closed more times than it is opened ? After all, the dup and fork system calls create copies of the open files without calling open; each of those copies is then closed at program termination.
Example: most program don't open their stdin device, but all of them end up closing it. How does the driver know when an open device file has really been closed ?

Not every close system call causes the release method to be invoked, only the calls that actually release the device data structure (so, the name's method). The file keeps a counter of how many times a file structure is being used. 
Neither fork nor dup creates a new file structure (only open does that), but they increment the counter in the existing structure, and the close system call just call the release method when the counter for the file structure drops to 0,
which happens when the structure is destroyed.

The flush flush methos is called every time an application calls close, but, very few derivers implements flush, because usually there is nothing to perform at close time, unless release is involved.

The previous discussion applies even when the application terminates without explicitly closing its open files, the kernel automatically closes any file at process exit time ( by internally using the close syscall).

READ and WRITE

both perform a similar task, copying data from and to application code. So, their prototypes are pretty similar:

ssize_t read(struct file *filp, char __user *buff, size_t count, loff_t *offp);
ssize_t write(struct file *filp, const char __user *buff, size_t count, loff_t *offp);

'filp' is the file pointer and 'count' is the size of the requested data transfer. The buff argument points to the user buffer holding the data to be written or the empty buffer where the newly read data should be placed. offp is a pointer to a long offset type object, that indicates the file position the user is accessing. The return value is a 'signed size type'.

the __user flag means a user-space data, so, in the above case, the *buff argument to read and write methods, is a user-space pointer. but, it can not be directly dereferenced by kernel code due the following restrictions:

- Depending on which architecture the driver is running on, and how the kernel was configured, the user-space pointer may not be valid while running in kernel mode at all. There may be no mapping for that address, or it could point to some other random data.
- Even if the pointer does mean the same thing in kernel space, user-space memory is paged, and the memory in question might not be resident in RAM when the system call is made. Attempting to reference the user-space memory directly could generate a page fault, which is not allowed in kernel code. The result would be an "oops", which would result in the death of the process that made the system call.
- The pointer in question has been supplied by a user program, which could be buggy or malicious. If the driver blindly dereference a user-supplied pointer, it provides an open doorway allowing a user-space program to access or overwrite memory anywhere in the system. So, never dereference an user-space pointer directly.

The access of user-space buffers should be performed by special, kernel-supplied functions, in order to be safe, defined in <asm/uaccess.h>. These functions use some special architecture-dependent magic to ensure that data transfers between kernel and user space happen in a safe and correct way.

The whole for read/write of scull needs to copy whole segment of data to or from the user address space, such capability is offered by the following kernel functions:

unsigned long copy_to_user(void __user *to, const void *from, unsigned long count);
unsigned long copy_from_user(void *to, const void __user *from, unsigned long count);

These functions behave like normal memcpy functions, with some extra care, once it access user-space from kernel code. The user pages being addressed might not be present in memory, and the virtual memory can put the process to sleep while the page is being transferred into place, for example, when the page must be retrieved from the swap space. The net result for the driver is that any function that accesses user space must be reentrant, must be able to execute concurrently with another driver functions, and, must legally sleep.

The role of these two functions is not limited to copy data from and to the user space, they also check if the user space pointer is valid or not. If the pointer is invalid, no copy is performed; or if an invalid address is encountered during the copy, only part of the data is copied. In both cases, the return value is the amount of memory to be copied (0 in case of all data was copied)

__copy_to_user and __copy_from_user functions can be used if not necessary to check if the user space pointer is valid or not.

Whatever the amount of data the methods transfer, they should generally update the file position at *offp, after a successful of the system call; The kernel then propagates the file position change back into the file structure when appropriate. The pread and pwrite syscalls operates from a given file offset and do not change the file position.

both read and write methods return a negative value if an error occurs. A value greater than or equal to 0, instead, tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and an error occurs, the return value must be the count of bytes successfully transferred, and the error does not get reported until the next time the function is called. It requires that the driver remember that the error has occurred.

The kernel functions return a negative nymber to signal an error. and the value of the number indicates the kind of the error that occurred, but, programs that run in user space always see -1 as the error value. They need to access the errno variable to find out what happened.The user space behavior is dictated by the POSIX standard, but that standard does not make requirements on how the kernel operates internally.

the return value for read is interpreted by the calling application program:

- if the value equals the count argument passed to the read system call, the requested number of bytes has been transferred.
- if the value is positive, but smaller than count, only part of the data has been transferred.
- if value is 0, end-of-file was reached (and no data was read).
- a negative value means there was an error, the value specifies what the error was. according to <linux/errno.h>.

the return value for write is interpreted by the calling application program:

- if the value equals count, the requested number of bytes has been transferred
- If the value is positive, but smaller than count, only part of the data has been transferred.
- If the value is 0, nothing was written. This result is not an error.
- A negative value means an error occurred like in read. Valid error value are defined in <linux/errno.h>