Linux: pdftotext – tool to read PDF files in the Bash (here Fedora)
If you want to read in the Bash, on the command line on Linux a PDF file, you don’t need a graphical program. You need the command pdftotext.
But you have to install the collection of poppler-utils.
What are the poppler-utils?
Poppler-utils are collections of helpful PDF commands for the bash.
Where to find the poppler-utils:
poppler-utils are in the repository updates in Fedora:
dnf info poppler-utils
Name : poppler-utils
Epoch : 0
Version : 24.08.0
Release : 2.fc41
Architektur : x86_64
Installationsgröße : 796.7 KiB
Quelle : poppler-24.08.0-2.fc41.src.rpm
Aus dem Repository : updates
Zusammenfassung : Command line utilities for converting PDF files URL : https://poppler.freedesktop.org/
Lizenz : (GPL-2.0-only OR GPL-3.0-only) AND GPL-2.0-or-later AND LGPL-2.0-or-later AND LGPL-2.1-or later AND MIT
Beschreibung : Command line tools for manipulating PDF files and converting them to other formats.
Anbieter : Fedora Project
poppler-utils are in the repo updates of Fedora.
Installation of the poppler-utils in Fedora
sudo dnf install poppler-utils
What kind of programs contains this package?
You can check it with the command, what it contains:
dnf repoquery -l poppler-utils
sven@fedora:~$ dnf repoquery -l poppler-utils
Aktualisiere und lade Paketquellen:
Paketquellen geladen.
/usr/bin/pdfattach
/usr/bin/pdfdetach
/usr/bin/pdffonts
/usr/bin/pdfimages
/usr/bin/pdfinfo
/usr/bin/pdfseparate
/usr/bin/pdfsig
/usr/bin/pdftocairo
/usr/bin/pdftohtml
/usr/bin/pdftoppm
/usr/bin/pdftops
/usr/bin/pdftotext
/usr/bin/pdfunite
Here you can see the command pdftotext where is the command what we need for my example
dnf provides */pdftotext
bash-completion-1:2.16-1.fc41.noarch : Programmable completion for Bash
Repo : @System
Matched From :
Filename : /usr/share/bash-completion/completions/pdftotext
poppler-utils-24.08.0-2.fc41.x86_64 : Command line utilities for converting PDF files
Repo : @System
Matched From :
Filename : /usr/bin/pdftotext
bash-completion-1:2.16-1.fc41.noarch : Programmable completion for Bash
Repo : updates
Matched From :
Filename : /usr/share/bash-completion/completions/pdftotext
poppler-utils-24.08.0-2.fc41.x86_64 : Command line utilities for converting PDF files
Repo : updates
Matched From :
Filename : /usr/bin/pdftotext
bash-completion-1:2.13-2.fc41.noarch : Programmable completion for Bash
Repo : fedora
Matched From :
Filename : /usr/share/bash-completion/completions/pdftotext
Why pdftotext
also appears in bash-completion
When you ran dnf provides */pdftotext
, you saw both poppler-utils
and bash-completion
in the output. This is a crucial detail.
poppler-utils
provides the actual executable program at/usr/bin/pdftotext
.bash-completion
provides the autocompletion script at/usr/share/bash-completion/completions/pdftotext
.
This means that pressing the Tab key will suggest pdftotext
and its options, which is a big help. For example, if you type pdf
and press Tab, bash-completion
will show you all commands that begin with pdf
.
Here is an example on my computer:
sven@fedora:~$ pdf [tab] [tab]
Here is an example on my computer:
sven@fedora:~$ pdf [tab] [tab] [tab] …
pdf2dsc pdfatfi pdfdetach pdffonts pdfimages pdfjadetex pdflatex-dev pdfseparate pdf-stapler pdftk pdftohtml pdftops pdfunite pdf2ps pdfattach pdfetex pdfgrep pdfinfo pdflatex pdfroff pdfsig pdftex pdftocairo pdftoppm pdftotext pdfxmltex
It also contains pdftotext – you see it.
How does pdftotext work in the Bash?
I have created a PDF file from my own blog, that it calls „GitHub_Reisen_und_IT.pdf“
If you only want to see your PDF files in the current directory, you only type:
ls *.pdf
This kind of combination does not hide folders in your directory. It only lists files with the extension PDF.
The command
pdftotext PDFfile.pdf –
Important is the – sign after PDF otherwise it does not work.
The – is a sign in the Bash for the standard output (stdout). It will send the text to the terminal.
If you have a large PDF file, then it is useful to use a pager like less.
pdftotext PDFfile.pdf – | less
Conclusion:
You see: With pdftotext you have powerful tool in order to read PDF files in the Bash. If you want to know more, more options, then use the manual of pdftotext
man pdftotext