Perl One-liners
The last days I enjoyed that perl can make little things on a computer so easy. In this posting I describe two one-liners for the dull jobs in life: removal of blank lines from a file and pretty print XML documents.
Removal of blank lines from a file
The problem: You have a file that contains a lot of empty lines (or lines filled nothing but whitespaces) and want to clean that file so it contains only information and not all those empty lines.
The solution.
perl -ni.orig -e 'print unless /^\s*$/;' somefile.txt
The details: as the perlrun documentation describes, the -n switch will loop through all input lines and execudes the script that is provided with the -e switch. The perlrun documentation also states that any additional arguments given with the perl run are used as file names that contain the input lines for the script. Therefore, we can simply add the filename that needs to be cleaned as an argument to the command line call.
In the example case the script provided with the -e switch prints all files that do not contain only whitespace characters. Perl's unless keyword is pretty usefull in these circumstances as it expresses if not in a more natural way. But in principal the following and even shorter line would do the same trick.
perl -ni.orig -e 'print if !/^\s*$/' somefile.txt
If the -i switch is missing perl would print the results of the conversion to the system's output rather than back into the file. In order to really clean the file, -i is required. This switch takes a single argument that is added as a suffix to the file name. Perl will move the original file to the extended filename and stores the results of the script under the old filename.
Pretty-printing XML files
The problem: you have an XML file that contains all XML as a single line. Of course no human being wants to read such XML data. Therefore, you want to pretty print the XML in order to achieve better readability of the XML data for human beings.
The solution:
perl -MXML::LibXML \ -e 'print XML::LibXML->new->parse_file($ARGV[0])->toString(1);' \ somefile.xml
The details: XML::LibXML offers a fast way to pretty print XML files quickly. By calling perl with the -MXML::LibXML switch, perl will load the XML::LibXML module. The script of the -e switch first instantiates a XML::LibXML object, then parses the file that is passed as a parameter into a DOM structure and immidiately serialises the DOM back into a string by using the toString() function. toString() takes one parameter that triggers pretty printing of the DOM structure. By default, XML::LibXML would print the DOM in the same way as it as originally parsed. However, if 1 is passed as a parameter to toString() XML::LibXML breaks up the structure of the DOM and prints it in a way that also humans can easily read the XML. Finally, script prints out the results.
If your XML is broken, then you will receive an error message about where to look for the error.
The module is not part of the core perl distribution and has to installed via CPAN. Additionally, XML::LibXML depends on a library called libxml2 that is doing the actual job for XML::LibXML. This library needs to be installed on your system too. However, the main linux distributions come already with this module pre-packaged, so you may find that module already installed in your system.