Managing Files

Part 1: Working with the Contents of Files

Let's consider the content of file /tmp/matrix.c. You may paste the contents of the file into /tmp/matrix.c using your favorite text editor.

#include <stdio.h>

#include "mpi.h"

#define N               4        /* number of rows and columns in matrix */

MPI_Status status;

double a[N][N],b[N][N],c[N][N];

main(int argc, char **argv)

{

  int numtasks,taskid,numworkers,source,dest,rows,offset,i,j,k;

  struct timeval start, stop;

  MPI_Init(&argc, &argv);

  MPI_Comm_rank(MPI_COMM_WORLD, &taskid);

  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

  numworkers = numtasks-1;
  head -n 3 /tmp/matrix.c > /tmp/matrix_head.txt

The initial three lines of content will be directed into a new file named '/tmp/matrix_head.txt'.

If you use only head /tmp/matrix.c, the first ten lines are displayed.

  tail -n 6 /tmp/matrix.c > /tmp/matrix_tail.txt

The last six lines of the file /tmp/matrix.c will be shown in /tmp/matrix_tail.txt.

  more /tmp/matrix.c

All of the lines of the file /tmp/matrix.c will be displayed.

  less /tmp/matrix.c

The content of the file /tmp/matrix.c will be displayed on the terminal and you can navigate by pressing the up and down arrows.

  wc /tmp/matrix.c
     21      64      367  /tmp/matrix.c

Here, 21 is shown in the first column which represents the 21 lines, 64 words and 367 characters of the file /tmp/matrix.c.

It is possible to provide the output for multiple files by listing the name of each separated by a space. For example: wc file1 file2 file3.

In case you need to know the size of an image file in the current directory as well as the total for all of them, you can use the -c option like: wc -c *.jpg.

Part 2: Data Manipulation

Let's apply commands to filter, sort, group, match, and replace data in the file /tmp/data.txt. You may paste the contents of the file into /tmp/data.txt using your favorite text editor.

NAME START LOCATION END LOCATION cM SNPs Comments
Wendi 72017 5827331 12.43 1686 Match to Mom
Sheila 6514775 1500362 6.65 1089 Match to Mom
Michael 3793615 12596858 17.25 2785 Match to Dad or IBS
Robert 4090545 5115145 2.68 500 Mom but not me
Sheila 2514775 5600362 8.65 1189 Match to Mom
  sed 's/Sheila/Linda/'/tmp/data.txt > /tmp/data.txt.bak

This replaces all occurrences of Sheila with Linda in the file /tmp/data.txt, and sends the output to data.txt.bak.

It is crucial to redirect the desired changes into another file in case you will need to review or compare to the original file.

  grep 'Match to Mom' /tmp/data.txt > /tmp/data.txt.2

The content of the file /tmp/data.txt.2 will include the lines that contain "Match to Mom", taking the file /tmp/data.txt as an input.

  awk '{print $1,$6;}' /tmp/data.txt > /tmp/data_tab.txt

It will display in columns: NAME (column 1) and Comments (column 6) in the /tmp/data_tab.txt file.

You may need to list only the rows that contain a value of cM greater than 10, then you run awk '$4 >10' /tmp/data.txt

If you want to know the rows that contain "Match to Mom", then type awk '$4 ~/Match to Mom/'

  sort -k 5n /tmp/data.txt

In this case, the data is going to be sorted according to SNPs because the option 5k (5th column) is set. It was set also n because they are numbers.

If you need to sort data in descending order, you will need to use the option -r, which means reverse, like this:sort -k 5n -r /tmp/data.txt.

In case you want to sort and remove duplicates, then use the option -u, like this: sort -u /tmp/data.txt.

If you want to sort a list to ordered by month name, then use the option -M, like this: sort -M /your/file.

Part 3: Working with a Collection of Files

Let's work with the files that are located inside /tmp/test_files. Here are the instructions to create them.

user:test_files x0y$ ls -l
total 0
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test0.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test1.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test2.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test3.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test4.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test5.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test6.txt
-rw-r--r--  1 x0y  user  0 Apr 10 09:37 test66.txt
  find . -name "*6*" -user x0y

In this case, the search happens in the current path (.) and is looking for those files that have the number 6 in their name, and were created by the user x0y.

If you want to search for a file(s) in which the filename has the characters "conf" and modified 7 days ago, then type: find / -name "*conf" -mtime 7.

If you want to find a file without searching over the entire network or mounted filesystems on your system, you would run: find / -name foo.bar -print -xdev.

Part 4: Comparing Differences Between the Contents of Files

For this section, we will use the files /tmp/test_file/test6.txt and /tmp/test_file/test66.txt. You may paste the contents of the two files into /tmp/test_file/ using your favorite text editor.

$ cat test6.txt
Weld I.D.    Material Grade        Segment Tested        Accepted
========================================================================
004          CS AH38               100%                  No
009          CS AH30                50%                  No
099          CS AH40                50%                  No
100          CS AH67               100%                  Yes

$ cat test66.txt
Weld I.D.    Material Grade        Segment Tested        Accepted
========================================================================
004          CS AH38               100%                  No
009          CS AH30                50%                  No
099          CS AH40                50%                  No
200          CS AH44                75%                  No
100          CS AH67               100%                  Yes
  diff test6.txt test66.txt
    5a6
    > 200        CS AH44            75%            No

In this case, the differences between the two files test6.txt and test66.txt are located in lines 5 and 6.

If you want to restrict the number of columns, you can run: diff --width=5 test6.txt test66.txt.

If you want to know if the files are different without interest in which lines are different, run diff -q test6.txt test66.txt.

Part 5: Compressing and Extracting Files

To create files with extensions such as .tar, tar.gz, .tgz, .gz, or .bz2 use the commands tar (also useful to extract files), gzip, or bzip2.

  gzip test66.txt

It will compress test66.txt file using the "gzip" command it will have as an output test66.gz.

  bzip2 -k test66.txt

It will compress the file test66.txt. It will keep the uncompressed version and create the new file: test66.txt and test66.txt.bz2.

To decompress the file and remove the bz2 extension, run bzip2 -d test66.txt.bz2.

  zip test.zip test66.txt test6.txt

It will compress test66.txt and test6.txt files into a directory called test.zip.

To compress a directory, run zip -r squash.zip dir1. This will zip the whole directory dir1 into squash.dir.

To decompress, use unzip squash.zip; this unzips it in your current working directory.

  tar -cvf output.tar /dirname

It will compress the /dirname directory and create a file called a "tar ball" named output.tar.

To extract the content of output.tar, run tar -xvf output.tar.