Split command in Linux - In this article, we are going to study
split
command in Linux with which we can break a large file into smaller pieces.To demonstrate this, we create a file
testFile.txt
using seq
command in Linux. For those who do not know about seq
command, it prints sequence of numbers, which we would dump into a file. Lets do it. # Dump 'seq' output to 'testfile.txt' [root@LinuxBox split_test]# seq 10000 > testfile.txt # First 10 lines [root@LinuxBox split_test]$ head testfile.txt 1 2 3 4 5 6 7 8 9 10 # Last 10 lines [root@LinuxBox split_test]$ tail testfile.txt 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
1. Basic use of split
The basic usage of any command is when it is not used with any option. In this case, we would supply file name as an argument or parameter to
split
command as shown below. When it gets executed, run ls
command to list the smaller parts of the file.[root@LinuxBox split_test]$ split testfile.txt # Large file has been split into number of smaller files [root@LinuxBox split_test]$ ls testfile.txt xaa xab xac xad xae xaf xag xah xai xaj
We could see a number of files with name in the format
x--
have been created. In order to make sure that they are the parts of the original file, we check the number of lines and even their contents.# Original file -> 10000 Lines # 10 parts -> 1000 lines each [root@LinuxBox split_test]$ wc -l * 10000 testfile.txt 1000 xaa 1000 xab 1000 xac 1000 xad 1000 xae 1000 xaf 1000 xag 1000 xah 1000 xai 1000 xaj 20000 total # Check the contents of first part [root@LinuxBox split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 # Check the contents of last file [root@LinuxBox split_test]$ tail xaj 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 [root@LinuxBox split_test]$
In this case, file has been split into 10 smaller chunks based on number of lines, such that every chunk consists of 1000 lines. Instead, you might want the file to be split into a specific number of chunks, say 5 chunks (so that, every chunk will contain 2000 lines). Lets see how to do that.
2. Split a file in 'n' smaller parts - Option -n
We can define the number of parts a file should be split into using option
-n
. The syntax for this is split -n [No. of chunks] [file name]
. Lets create 5 chunks of our file testfile.txt
.# Specify the number of chunks [root@LinuxBox split_test]$ split -n 5 testfile.txt # There are 5 chunks created [root@LinuxBox split_test]$ ls testfile.txt xaa xab xac xad xae # Their sizes may vary [root@LinuxBox split_test]$ wc -l * 10000 testfile.txt 2177 xaa 1955 xab 1956 xac 1955 xad 1957 xae 20000 total # But, they contribute to the same information [root@LinuxBox split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 [root@LinuxBox split_test]$ tail xae 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 [root@LinuxBox split_test]$
So, this command has created 5 chunks of the file for us, which might differ in their sizes, but eventually have same contents as in original file when put together. Next, we see how chunks can be created based on size of every chunk.
3. Split a file into chunks of equal sizes - Option -b
We've seen how file can be split based on number of lines and number of chunks, now we see how to split a file based on the size of every chunk so as to create chunks of equal sizes. For this, we use option
-b
as split -b [size] [file name]
, where the size must be mentioned in bytes.# We specify the chunk size to be 10000 bytes [root@LinuxBox split_test]$ split -b 10000 testfile.txt # It creates 5 chunks for us [root@LinuxBox split_test]$ ll total 108 -rw-r--r--. 1 root root 48894 Nov 11 14:09 testfile.txt -rw-r--r--. 1 root root 10000 Nov 11 14:38 xaa -rw-r--r--. 1 root root 10000 Nov 11 14:38 xab -rw-r--r--. 1 root root 10000 Nov 11 14:38 xac -rw-r--r--. 1 root root 10000 Nov 11 14:38 xad -rw-r--r--. 1 root root 8894 Nov 11 14:38 xae
We can see that there are 5 chunks created, 4 of which with a size of 10000 bytes and other with leftover data. Now that, we can split a file based on the size of each chunk, number of chunks and lines in each chunk. Lines? Not yet. 1000 lines is the default value and we can modify it as per our need.
4. Creating chunks with 'n' lines each - Option -l
With
-l
option of split
command, we can set the number of lines each chunk should contain. The syntax is the same, with different option this time. Lets split the file with each chunk having 1200 lines.# Specify the number of lines -> 1200 [root@LinuxBox split_test]$ split -l 1200 testfile.txt [root@LinuxBox split_test]$ ls testfile.txt xaa xab xac xad xae xaf xag xah xai [root@LinuxBox split_test]$ wc -l * 10000 testfile.txt 1200 xaa 1200 xab 1200 xac 1200 xad 1200 xae 1200 xaf 1200 xag 1200 xah 400 xai 20000 total # Verify their contents [root@LinuxBox split_test]$ head xaa 1 2 3 4 5 6 7 8 9 10 [root@LinuxBox split_test]$ tail xai 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
5. Numeric suffixes - Option -d
We have seen that the names of the chunks created are alphabetical if the format
x--
, where -
is also an alphabet. We can change this to a digit so that it reads as x01
, x02
and so on (it makes more sense), using option -d
as below.# Numeric Suffixes [root@LinuxBox split_test]$ split -d testfile.txt [root@LinuxBox split_test]$ ls testfile.txt x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
6. Suffix length - Option -a
We can also change the suffix length using option
-a
, so that x01
would read as x0001
if we specify suffix length = 4. Lets check this.# Suffix Length = 4 [root@LinuxBox split_test]$ split -d -a 4 testfile.txt [root@LinuxBox split_test]$ ls testfile.txt x0000 x0001 x0002 x0003 x0004 x0005 x0006 x0007 x0008 x0009
That's it! Thank you.
0 comments:
Post a comment