Linux: Convert a png file into a pdf file
Important: I have only tested with png files.
Other important notes: You have to install the programs tesseract and pandoc for this script.
What is tesseract?
A commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.
The website of tesseract.
What is pandoc?
This package provides a command-line executable that uses the pandoc library to convert between markup formats. For pdf output please also install pandoc-pdf or weasyprint.
What is pandoc-pdf
This package pulls in the TeXLive latex package collection needed by pandoc to generate pdf output using pdflatex. To use –latex-engine=xelatex or lualatex, install texlive-collection-xetex or texlive-collection-luatex respectively.
The website about pandoc
Check first with
which tesseract
and
which pandoc
whether it is installed on your computer
Hint: In Fedora: tesseract and pandoc are in the repositories: updates
To check whether you have the repository updates. The correct name is „fedora-updates.repo“ go to
cd /etc/yum.repos.d/ -> ls -> fedora-updates.repo
It should be installed during the installation.
To install the programs, tesseract and pandoc
sudo dnf install tesseract pandoc
If you have a screenshot tool that, you have created it with a screenshot tool.
I use here on Fedora the program gnome-screenshot, because I use the gnome desktop.
gnome-screenshot is in the repository „fedora“
What is gnome-screenshot?
gnome-screenshot lets you take pictures of your screen.
If everything is fine and you have installed everything.
Script
#!/bin/bash
clear
echo "Welcome to extract from a screenshot to pdf"
echo
echo -n "Add the path and the file name "
read pfad
if [ -d "$pfad" ]; then
echo "$pfad does exists"
cd $pfad
else
echo "$pfad" does not exist
exit 1
fi
echo -n "Add the file name "
read filename
if [ -f "$filename" ]; then
echo "$filename does exists"
else
echo "$filename" does not exists
fi
echo -n "What is the name of the output file? "
read output
tesseract -l deu $filename stdout | xargs > 12.txt
pandoc 12.txt -o $output.pdf
rm 12.txt