-
Notifications
You must be signed in to change notification settings - Fork 0
/
Readme how it works.txt
472 lines (437 loc) · 18.6 KB
/
Readme how it works.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
This code is just a user of the program you need to install first on your system
the progamm is called tesseract
see here:
https://github.com/tesseract-ocr/tesseract
This was tested only on windows 10
to install it i used this compiled exe version from the UB Mannheim (German university)
https://github.com/UB-Mannheim/Tesseract_Dokumentation/blob/main/Tesseract_Doku_Windows.md
there i found the download link for the installer (Windows)
https://digi.bib.uni-mannheim.de/tesseract/
this is the download link for version 5.2 as a windows installer.
https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.2.0.20220712.exe
then the installation need to be added to the systems environment variables
for that go to the windows search field ant type 'path'
the should apperar environment variables click on it and search for Path and add
the path where you installed tesseract to it.
"D:\69\tesseract.exe"
the quickest way to get results is to:
1. drop the video you want to extract the text from into the input folder
2. run Double_Click_Me_To_RUN_video_to_text_extractor_In_Terminal.bat
3. hit enter on each input the terminal is asking you (like enter enter enter enter) ---> this means you added no video_path, no output_name, no frame_rate which sets it to use its default values.
4. what happens then is: it looks into the input folder for mp3 or mp4 video files and processes each one !!!!!if you have multible in the folder it process all of them.
5. when its completed you get the extracted text in the terminal and also in the output folder + the images in the ouput folder.
to adjust the frame rate the video will be processed do the same as above but enter your value at frame_rate.
you also can use Double_Click_Me_To_OPEN_MiniConda_VENV_In_Terminal .bat
this opens the terminal and you can directly run the
extract_fames.py
extract_fames.py takes this arguments
parser.add_argument("--video_path", type=str, default=" ", help="Path to the video file")
parser.add_argument("--output_name", type=str, default="output.txt", help="Name of the output text file")
parser.add_argument("--frame_rate", type=int, default=2, help="Desired frame rate (in seconds) e.g. 1 means it takes each second of the video a image and extracts the text from it.")
parser.add_argument("--tesseract_path", type=str, default=default_tesseract_path, help="Custom tesseract Path if you have pytesseract installed you also can use your custom pytesseract leave this empty and it will use its own.")
parser.add_argument("--headline", type=str, default=headline, help="Here you can add a headline that will appear in the output text file on top of the text like headlines ussually do.")
to run it in the miniconda venv (virtual environment) you can do:
python extract_fames.py --video_path "D:\47\example_video.mp4" --output_text_file "output.txt" --frame_rate 2 --headline "this is a headline"
the headline will be the top line of the output.txt file and it defaults to "can you extract all code from the following text, fix it and print the complete code"
which i used to get some python code out of a video using a llm by pasting the complete output.txt into the prompt.
-------------------------EXAMPLE TEXT FROM A VIDEO----------------------------------------------------------------------------------------------------------------------------
can you extract all code from the following text, fix it and print the complete code
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 0
‘_—rs
Write a short summary of the Image description above:
The image is a depiction of Taylor Swift, a renowned American singer-songwr
iter and musician.
Ask a question about the image or type 'exit' to quit: what is she wearing?
> what is she wearing?
Image description:Taylor Swift She is wearing a green dress.
Write a short summary of the Image description above:
There's a captivating image that has caught our attention! It depicts none o
ther than Taylor Swift, who is looking radiant in her green dress. The color
choice complements her perfectly, making her stand out with grace and style
Ask a question about the image or type 'exit' to quit: what color is her eye
Sz
> what color is her eyes?
Image description:Taylor Swift She is wearing a green dress. Her eyes are bl
ue.
Write a short summary of the Image description above:
This image features pop sensation Taylor Swift, sporting a stunning green g
own. Her captivating blue eyes are visible in this snapshot, adding to her o
verall charm and appeal.
Ask a question about the image or type 'exit' to quit: does she have earring
?
> does she have earring?
Image description:Taylor Swift She is wearing a green dress. Her eyes are bl
ue. Yes, she has earring in her left ear.
Write a short summary of the Image description above:
Hey there! So I've got this image to describe for you. It's Taylor Swift, th
e singer we all know and love! In the picture, she is wearing a stunning gre
en dress that really brings out her eyes. Her eyes are blue - they're so vib
rant they could be the ocean on a sunny day. And don't forget those earrings
in her left ear; they add just the right touch of sparkle to an already stu
nning look!
Ask a question about the image or type 'exit' to quit: |
cE}
some other information:
Tokens are not cut up exactly where the words start or end;
tokens can include trailing spaces and even sub-words.
Here are some helpful rules of thumb for
understanding tokens in terms of lengths:
1 token ~= 4 chars in English. 1 token ~= ¾ words. 100 tokens ~= 75 words.
Just to put the number of "32k tokens" into somewhat estimated context.
32k tokens, 3/4 of 32k is 24k words,
each page average is 500 or 0.5k words,
so that's basically 24k / . 5k = 24 x 2 =~48 pages.
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 2
100% Local Tiny Al Vision Language Model (1.6B) - Very Impressive!
IB CAUsers\kris\Pythoriwstreamn\moondream\batchpy - Notepad++
Fle E& Search View Encoding Language Stings Toole Macro Run Pligine_ Window 7
@e°B@96/ XO |> (aajag SIBVWOKESO w
vio, py Beane vatepy BA newt
from typing import List
import requests
import ffmpeg
import subprocess
import json
from moviepy.editor import CompositeAudioClip
from moviepy.editor import concatenate_audioclips
import cv2
import time
from pathlib import Path
from faster_whisper import WhisperModel
from moviepy.editor import VideoFileClip, AudioFileClip
# ANSI escape code for colors
PINK = '\@33[95m'
CYAN = '\@33[96m'
YELLOW = '\033[93m'
NEON_GREEN = '\@33[92m"
RESET_COLOR = '\@33[@m'
# Initialize the OpenAI client with the API key
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
sdef mistral7b(user_input):
f streamed_completion = client.chat.completions.create(
model="local-model",
messagec=[
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 4
100% Local Tiny Al Vision Language Model (1.6B) - Very Impressive!
IB CAUsers\kris\Pythoriwstrearn\moondream\batch.py - Notepad++
Fle E§ Search View Encoding Language Stings Toole Macro Run Pligine Window 7
Ge B8@96/ XO |> \aajae SIBVWOKEE © w
iso, py OT eae E vatepy BIA newt
from typing import List
import requests
import ffmpeg
import subprocess
import json
from moviepy.editor import CompositeAudioClip
from moviepy.editor import concatenate_audioclips
import cv2
import time
from pathlib import Path
from faster_whisper import WhisperModel
from moviepy.editor import VideoFileClip, AudioFileClip
# ANSI escape code for colors
PINK = '\@33[95m'
CYAN = '\@33[96m'
YELLOW = '\033[93m'
NEON_GREEN = '\033[92m"
RESET_COLOR = '\@33[@m'
# Initialize the OpenAI client with the API key
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
de¥ mistral7b(user_input):
streamed_completion = client.chat.completions.create(
model="local-model",
messagec=l
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 6
BE CALs \Pytorivsteammoondeam\ batch -Netpad+
Fle Est Search View Encoding Language Stings Tole Macro un Plugin Win
@e e996 x0 b Qaaag STE WO KOS >
en 7 OTE wot O14 rent 8
22 | CYAN = '\@33[96m'
23 | YELLOW = '\@33[93m"
24 | NEON_GREEN = ‘'\@33[92m'
25 | RESET_COLOR = '\033[@m'
27 | # Initialize the OpenAI client with the API key
28 | client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
30 |cdef |mistral7b(user_input):
31 streamed_completion = client.chat.completions.create(
32 model="local-model",
33 |f messages=[
34 {"role": "system", "content": "Yau are a great writer."},
B5 {"role": "user", "content": user_input}
36 ],
37 stream=True # Enable streaming
38 )
ais)
40 full_response = ""
41 line_buffer = ""
43 for chunk in streamed_completion:
44 delta_content = chunk.choices[@].delta.content
46 if delta_content is not None:
47 line_buffer += delta_content
AR
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 8
BY CAUser sis \Pythonstreammcondreambatehpy - Notepads - 0
Fle Es Search View Encoding Language Stings Toole Macro Run Plgine_ Window 7 2
ae 8899 x0 (9 aajag SIBVmOKES © ®
iso, oy Bama walepy GIF newt
52 print(NEON GREEN + line + RESET_COLOR)
53 full_response += line + ‘\n'
54 line_buffer = lines[-1]
56 | if line_buffer:
57 print(NEON_GREEN + line_buffer + RESET_COLOR)
58 full_response += line_buffer
ae
60 return full_response
62 | # ANSI escape codes for colors, for styling the terminal output
63 | PINK = '\@33[95m'
64 | CYAN = '\@33[96m' i
65 | # ANSI escape codes for colors, for styling the terminal output
66 | PINK = '\@33[95m'
67 | CYAN = '\@33[96m'
68 | YELLOW = '\033[93m'
69 |;-NEON_GREEN = ‘\@33[92m"
7 ||RESET_COLOR = '\033[@m'
TA
72 | def process_images(folder_path: str) -> (TextModel, list):
73 model_path = snapshot_download("“vikhyatk/moondream1")
74 vision_encoder = VisionEncoder(model_path)
75 text_model = TextModel(model_path)
77 descriptions = [] # This will hold descriptions of all images
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 10
BY CAUser ss \Pythonstreammcondreambtchpy Notepads
Fle Es Search View Encoding Language Stings Toole Macro Run Pligine_ Window 7
ae B@9s XO |> \aajae SIBVWOKEEO vw
io, py Beare vatepy BA newt
UA
9a
CYAN = ‘\033[96m'
YELLOW = '\933[93m'
NEON_GREEN = ‘\033[92m’
RESET_COLOR = '\933[0m'
sdef process_images(folder_path: str) -> (TextModel, list):
model_path = snapshot_download("vikhyatk/mdondreams")
vision_encoder = VisionEncoder(model_path)
text_model = TextModel(model_path)
descriptions = [] # This will hold descriptions of all images
# Iterate over each file in the folder
for filename in os.listdir(folder_path):
i if filename.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', ‘.gif')):
image_path = os.path.join(folder_path, filename)
image = Image.open(image_path)
# Optionally, display the image
# image. show()
image_embeds = vision_encoder (image)
descriptions. append(description)
return text_model, descriptions
sdef convert_mp4_to_mp3(mp4_file_path, mp3_file_path):
description = text_model.answer_question(image_embeds, “Identify the person in the image with thie
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 12
RB CAUser ss \Pythonatreayncondreambatehpy - Notepads
Fle Es Search View Encoding Language Stings Toole Macro Run Phigine_ Window 7
45° BH8e99 x0 (9 aajag 5 IBV mOKES © ca
vio, oy Oakey valepy BIA newt
aol}
edef
descriptions.append(description)
return text_model, descriptions
convert, mp4 to) mp3(mp4_file_path, mp3_file_path):
Convert an MP4 file to MP3 format using moviepy.
Args:
mp4_file_path (str): Path to the MP4 file.
mp3_file_path (str): Desired output path for the MP3 file.
Returns:
bool: True if conversion successful, False otherwise.
try:
# Load the MP4 file
video_clip = VideoFileClip(mp4_file_path)
# Extract audio from the video clip
audio_clip = video_clip.audio
# Write the audio to an MP3 file
audio_clip.write_audiofile(mp3_file_path)
# Close the clips
viden clin.close()
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 14
Be CAUseni ytorvstemimoondeamstchgy Notepads
Fle Est
Search View Enceding Language Settings Toole Macro Run Plugine Vin
Window 7
ae 88099 x0 |9 aajag 5 IBV meKaS © w
iso, py Beach E valepy A newt
alas)
ALAS)
116 ff
dials)
alps?)
1:23
tbady
131 fF
# Write the audio to an MP3 file
audio_clip.write_audiofile(mp3_file_path)
# Close the clips
video_clip.close()
audio_clip.close()
return True
except Exception as e:
print(f"An error occurred: {e}")
return False
def transcribe_chunk(model, file_path):
segments, info = model.transcribe(file_path, beam_size=7)
transcription = ‘ ‘'.join(segment.text for segment in segments)
return transcription
# Function to extract frames from video
‘def extract_frames(video_ path, output_folder, frame_interval=60):
if not os.path.exists(output_folder):
os.makedirs(output_folder)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
nrint("Frror: Could nat anen viden.")
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 16
IB CAUsers\irs,\Python'ystream\moondream\batchpy - Notepad++
Fle Es Search View Encoding Language Stings Toole Macro Run Phigine_ Window 7
a5 8899 x0 | aajag Ss IBV mOKES © »
iso, py Beachy vatepy BF newt
116 fp
allt)
dle
ilsyy
audio_clip.close()
return True
except Exception as e:
print(f"An error occurred: {e}")
return False
sdef transcribe chunk(model, file_path):
segments, info = model.transcribe(file_path, beam_size=7)
transcription = ° ‘.join(segment.text for segment in segments)
return transcription
# Function to extract frames from video
def extract_frames(video_path, output_folder, frame_interval=60):
if not os.path.exists(output_folder):
os.makedirs(output_folder)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Could not open video.")
return []
frame_count = @
extracted_frame_paths = []
while cap.isOpened():
ret. frame = can-.read()
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 18
IB CAUser2\irs, Python ystream\moondream\batchpy -Notepads-+
Fle Es Search View Encoding Language Stings Toole Macro Run Phigine_ Window 7
ae° 8899 x0 | aajag Ss IBVmOKkKES © »
iso, py Beachy valepy BF newt
116 fp
allt)
122.
2H
dls}
ilsyy
audio_clip.close()
return True
except Exception as e:
print(f"An error occurred: {e}")
return False
sdef transcribe _chunk(model, file_path):
segments, info = model.transcribe(file_path, beam_size=7)
transcription = ' '.join(segment.text for segment in segments)
return transcription
# Function to extract frames from video
sdef extract _frames(video_path, output_folder, frame_interval=60):
if not os.path.exists(output_folder): I
os.makedirs(output_folder)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Could not open video.")
return []
frame_count = @
extracted_frame_paths = []
while cap.isOpened():
ret. frame = can-read()
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 20
WB CAUsers\irs, Python ystream\moondream\batchpy - Notepad++
Fle Es Search View Encoding Language Stings Toole Macro Run Phigine_ Window 7
a5 °BH8e9 x0 |9 aajag 5 IBV MOKES © »
iso, py Beare valepy BF newt
Ten
116 fp
abil)
122.
ilsyy
audio_clip.close()
return True
except Exception as e:
print(f"An error occurred: {e}")
return False
sdef transcribe_chunk(model, file_path):
segments, info = model.transcribe(file_path, beam_size=7)
transcription = ' '.join(segment.text for segment in segments)
return transcription
# Function to extract frames from video
sdef extract_frames(video_path, output_folder, frame_interval=60):
a if not os.path.exists(output_folder): :
os.makedirs(output_folder)|
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Could not open video.")
return []
frame_count = @
extracted_frame_paths = []
while cap.isOpened():
ret. frame = can-.read()
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 22
IB CAUser2\irs, Python ystream\moondream\batchpy Notepad++
Fle Es Search View Encoding Language Stings Toole Macro Run Phigine_ Window 7
a5 BH8e9 x0 | aajag 5 IBV mOKES © »
iso, py Beare valepy BF newt
Ten
116 fp
abil)
122.
ilsyy
audio_clip.close()
return True
except Exception as e:
print(f"An error occurred: {e}")
return False
‘def transcribe_chunk(model, file_path):
segments, info = model.transcribe(file_path, beam_size=7)
transcription = ' '.join(segment.text for segment in segments)
return transcription
# Function to extract frames from video
sdef extract_frames(video_path, output_folder, frame_interval=60):
a if not os.path.exists(output_folder): :
os.makedirs(output_folder)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Could not open video.")
return []
frame_count = @
extracted_frame_paths = []
while cap.isOpened():
ret. frame = can-.read()
------------------------------------------------------------------ Video: D:\47\02\video_to_text_extractor\input\example_video.mp4 frame: 24
IB CAUsers\eis\Pythorysteamymoondh 7
Fle Eat Search View Encoding Language Settings Toole Macro Run Plugins Wi
Ge BH9e XO (9 aajag er Pat ra
vision. py @ batchpy BD valkpy OF new! @
142 | if frame_count % frame_interval =
143 frame_filename = f' {output ONC EATETETOU {frame_count}.jpg'
144 cv2.imwrite(frame_filename, frame)
145 extracted_frame_paths.append(frame_filename)
147 frame_count += 1
149 cap.release()
150 print(f"Frame extraction complete. {len(extracted_frame_paths)} frames extracted.")
151 return extracted_frame_paths
153 | # Main execution
154 video_path = 'C:/Users/kris_/Python/vstream/moondream/gram2.mp4'
155 | output_folder = "C:/Users/kris_/Python/vstream/moondream/images”
156 | mp3_file_path = "C:/Users/kris_/Python/vstream/moondream/sound.mp3"
157 | convert_mp4_to_mp3(video_path, mp3_file_path)
159 \sdef main():
160 # Choose your model settings
161 #model_size = "medium.en"
162 #model = WhisperModel(model_size, device="cuda", compute_type="float16")
163 #transcription = transcribe_chunk(model, mp3_file_path)
164 #print (transcription)
166 frame_paths = extract_frames(video_path, output_folder)
162 # Pracess imanes in the folder and ohtain descrintians