Ubuntu环境下把word文档转成pdf,把pdf文件转成jpg

汉王 linux 2018年06月15日 收藏

环境搭建

使用语言 python3

安装imagemagick(pdf转jpg是内部需要调用到此工具)

apt-get install imagemagick

安装libreoffice(此工具用于将word文档转化成pdf文件)

apt-get install libreoffice

安装python wand,PIL库

pip install wand

pip install PIL

PDF转JPG

先转png,再转jpg是为了避免出现黑色,透明等背景,造成转换出来的图片与pdf文件显示不一样

  1. from PIL import Image as Image2
  2. from wand.image import Image
  3. from wand.color import Color
  4.  
  5. def convert_pdf_to_jpg(filename):
  6. end_length = len(filename.split('.')[-1]) + 1
  7. title = filename[0:-end_length]
  8. title = title.split('/')[-1]
  9.  
  10. #resolution为分辨率,background为背景颜色
  11. with Image(filename=filename, resolution=150, background=Color('White')) as img :
  12.  
  13. #页数
  14. length = len(img.sequence)
  15.  
  16. #如果页数超过1页,生成的文件名会依次加上页码数
  17. with img.convert('png') as converted:
  18. path = 'static/local_images/%s.png' % title
  19. converted.save(filename=path)
  20. image_list = []
  21. if length == 1:
  22. path = 'static/local_images/%s.png' % title
  23. image_list.append(path)
  24. else:
  25. for i in range(0, length):
  26. path = 'static/local_images/%s-%d.png' % (title, i)
  27. image_list.append(path)
  28. jpg_list = []
  29. for img in image_list:
  30. image = Image2.open(img)
  31. x,y = image.size
  32. background = Image2.new('RGBA', image.size, (255,255,255))
  33.  
  34. try:
  35. background.paste(image, (0, 0, x, y), image)
  36. image = background.convert('RGB')
  37. except:
  38. image = image.convert('RGBA')
  39. background.paste(image, (0, 0, x, y), image)
  40. image = background.convert('RGB')
  41.  
  42.  
  43. title = img.split('.')[0]
  44. name = title + '.jpg'
  45. image.save(name)
  46. os.remove(img)
  47. name = "%s/%s" %(static_host, name)
  48. jpg_list.append(name)
  49.  
  50. return jpg_list

word文档转PDF

python没有直接把word转换成pdf文档的库,只能先安装libreoffice工具,然后利用os库系统调用libreoffice工具

  1. import os
  2.  
  3. def convert_doc_to_pdf(filename):
  4. end_length = len(filename.split('.')[-1]) + 1
  5. name = filename[0:-end_length]
  6.  
  7. cmd = 'libreoffice --convert-to pdf %s' % filename
  8. os.system(cmd)
  9. name = name.split('/')[-1] + '.pdf'
  10. return name